EC2: Cómputo en la nube a profundidad

•Télécharger en tant que PPTX, PDF•

2 j'aime•2,248 vues

Este documento presenta una introducción a AWS y EC2. Explica cómo EC2 ofrece servidores virtuales en la nube con escalabilidad rápida y flexibilidad de pago. Se describen los diferentes tipos de instancias EC2 y sus características de rendimiento para diferentes cargas de trabajo. También analiza factores como la virtualización, el rendimiento de CPU e I/O, y proporciona consejos para optimizar el desempeño en EC2.

Technologie

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roberto Fuente, Technical Account Manager, AWS Support
Damian Traverso, Solutions Architect AWS LATAM
Abril 28, 2016
EC2
Cómputo en la nube a profundidad

Qué esperar de esta sesión ?
• Introducción a AWS y EC2
• Definir el desempeño de un sistema y cómo se
caracteriza para diferentes cargas de trabajo
• Cómo las instancias EC2 ofrecen un óptimo desempeño,
manteniendo flexibilidad y agilidad
• Cómo aprovechar de mejor manera el uso de las
instancias EC2

Infraestructura Global de AWS
Region
Edge Location
12 Regions
33 Availability Zones
54 Edge Locations

US West (OR)
AZ A AZ B
AZ C
GovCloud (US)
AZ A AZ B
US West (CA)
AZ A AZ B
AZ C
US East (VA)
AZ A AZ B
AZ C AZ D
AZ E
*A limited preview of the China (Beijing) Region is available to a select group of China-based and multinational companies with customers in China.
These customers are required to create a AWS Account, with a set of credentials that are distinct and separate from other global AWS Accounts.
EU (Ireland)
AZ A AZ B
AZ C
AZ A AZ B
S. America (Sao
Paulo)
Asia Pacific
(Tokyo)
AZ A AZ B
AZ C
AZ A AZ B
Asia Pacific
(Singapore)
China (Bejing)Asia Pacific
(Sydney)
AZ A AZ B
EU (Frankfurt)
AZ A AZ B
AWS Regions
China (Beijing)*
AZ A AZ B
Regiones de AWS y Zonas de Disponibilidad (AZs)

Amazon Elastic Cloud Compute (EC2)
Servidores Virtuales
en la nube de AWS
Rápida y fácil
escalabilidad,
según lo necesite
Pague únicamente
por lo que usa
Sistemas Operativos
ya conocidos: Linux y
Windows

Amplia variedad de Tipos de Instancias
M4
General
purpose
Compute
optimized
C4
C3
Storage and IO
optimized
I2 G2
GPU
enabled
Memory
optimized
R3D2
M3

Amazon EC2 permite…
• Construir fácilmente aplicaciones con HA
• Distribuir la carga entre servidores EC2 usando AWS Elastic
Load Balancers
• Garantizar alta disponibilidad y escalabilidad usando Auto
Scaling
• Usar múltiples Zonas de Disponibilidad (AZs)
• Elegir entre diferentes modelos comerciales

Diferentes modelos comerciales
Instancias
Reservadas
Pague un adelanto inicial
mínimo
Reserve la capacidad
Asegure una tarifa menor por
hora
Instancias
On-Demand
Pague de acuerdo con el uso
Tarifa plana por hora
Sin contratos ni compromisos
Instancias
Spot
Haga una oferta
Economice hasta 90% en
comparación con On-Demand
Lance 1,000s de instancias
10:00
10:05
10:10

Selecionando un servidor
• Los servidores son reservados para realizar trabajos
• El desempeño se mide de manera diferente
dependiendo del trabajo que se realice

• Lo que desempeño significa,
depende de la perspectiva:
• Tiempo de respuesta
• Rendimiento
• Consistencia
Desempeño = perspectiva
Aplicación
Librerías de Sistema
Llamadas a sistema
Kernel
Dispositivo
Carga

Factores de desempeño
Recurso Factores Indicadores
CPU Sockets, número de núcleos,
frecuencia de reloj, capacidad
Utilización de CPU, tamaño de la fila de
ejecución
Memoria Capacidad Memoria libre, paginación, swapping
Interfaz de
Red
Ancho de Banda Máximo, paquetes Cantidad paquetes recibidos,
transferencia de paquetes sobre el
máximo ancho de banda
Disco IOPS, Desempeño Tamaño de fila en espera, utilización de
dispositivos, errores en los dispositivos

Utilización de Recursos
• Cada applicacion tienen una perfile de utilizacion de
recrusos, para un dado nivel de despemeño.
• Un recurso con utilización del 100% no puede recibir o
atender más peticiones
• Baja utilización indica que se han reservado más
recursos de los necesarios

Ejemplo: Aplicación Web
• MediaWiki instalado en un servidor Apache con 140
páginas de contenido
• Incremento de carga en intervalos de tiempo

Ejemplo: Aplicación Web
• Estadísticas de Memoria

Ejemplo: Aplicación Web
• Estadísticas de Disco

Ejemplo: Aplicación Web
• Estadísticas de Red

Ejemplo: Aplicación Web
• Estadísticas de CPU

Selección de instancia = optimización
• La selección de una instancia es equivalente a la
optimización de los recursos
• Dar de baja instancias es tan fácil como adquirir nuevas
• Alinear el tipo de carga con el tipo de instancia óptimo

Instrucciones de CPU y Niveles de Protección
• CPU tiene dos niveles de protección: Kernel y Aplicación
• Instrucciones privilegiadas no se pueden ejecutar en
modo usuario para proteger el sistema.
• Aplicaciones apalancan las llamadas al sistema al
kernel
Instrucciones privilegiadas:
• Inicio de I/O
• Acceso a I/O de Dispositivos
(red, disco)
• Manejo del tiempo
• Pausa CPU Aplicación
Kernel

Ejemplo: Llamadas al sistema de una aplicación web
[ec2-user@ip-10-0-121-0 ~]$ sudo strace -c -p 2440
Process 2440 attached
^CProcess 2440 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 931 11 read
0.00 0.000000 0 887 write
0.00 0.000000 0 121 open
0.00 0.000000 0 154 close
0.00 0.000000 0 1357 32 stat
0.00 0.000000 0 341 fstat
0.00 0.000000 0 99 11 lstat
0.00 0.000000 0 865 poll
0.00 0.000000 0 121 mmap
0.00 0.000000 0 121 munmap
0.00 0.000000 0 220 brk
0.00 0.000000 0 11 rt_sigaction
0.00 0.000000 0 11 rt_sigprocmask
0.00 0.000000 0 22 writev
0.00 0.000000 0 66 22 access

X86 CPU Virtualización : Antes de Intel VT-x
• Traducción a binario para instrucciones privilegiadas
• Para-virtualization (PV)
• PV requiere pasar por VMM, introduciendo latencia
• Aplicaciones que son ligados/bound a llamadas de sistemas son
más afectadas
VMM
Application
Kernel
PV

27
Aplicando la ley de Moore
90 nm
2003
180 nm
1999
130 nm
2001
65 nm
2005
45 nm
2007 32 nm
2009
22 nm
2012 14 nm
2014
LEY DE MOORE
Habilitando nuevos dispositivos con mayor funcionalidad y
complejidad, mientras se controla la potencia, el costo y el tamaño.
(duplicando la integración cada 2 años)

28
Intel® Core™
Microarchitecture
TOCK
New
Micro-
architecture
Merom
65nm
TICK
Penryn
New
Process
Technology
45nm
Nehalem
Microarchitecture
TOCK
New
Micro-
architecture
Nehalem
Xeon 5500
45nm
TICK
Westmere
Xeon 5600
32nm
New
Process
Technology
Sandy Bridge
Microarchitecture
TOCK
Sandy
Bridge
Xeon E5
32nm
New
Micro-
architecture
TICK
Ivy
Bridge
Xeon E5 v2
22nm
New
Process
Technology
Haswell
Microarchitecture
TOCK
Haswell
Xeon E5 v3
22nm
New
Micro-
architecture
TICK
Broadwell
Xeon E5 v4
14nm
New
Process
Technology
Modelo Tick Tock – Evolución de plataformas Xeon

RED
Datos en
movimiento
ALMACENAMIENTO
Datos
estacionarios
COMPUTO
Datos siendo
transformados
Una arquitectura común para todo el Datacenter
Brindando economias de escala a toda la infraestructura

X86 CPU Virtualización : Despues de Intel VT-x
• Virtualización asistida por hardware (HVM)
• PV-HVM utiliza PV drivers para operaciones que son lentas a ser
emuladas. :
• e.g. Red y I/O de disco
Kernel
Application
VMM
PV-HVM

Instancias C4
Custom Intel E5-2666 v3 at 2.9 GHz
Gestión de P-state C-state
Model vCPU Memory (GiB) EBS (Mbps)
c4.large 2 3.75 500
c4.xlarge 4 7.5 750
c4.2xlarge 8 15 1,000
c4.4xlarge 16 30 2,000
c4.8xlarge 36 60 4,000

Instancias: T2
• Menor costo de instancias
• Burstable performance
• Asignación fija de créditos CPU
Model vCPU CPU Credits
/ Hour
Memory
(GiB)
Storage
t2.nano 1 3 0.5 EBS Only
t2.micro 1 6 1 EBS Only
t2.small 1 12 2 EBS Only
t2.medium 2 24 4 EBS Only
t2.large 2 36 8 EBS Only

How Credits Work
• Un crédito de CPU proporciona la
performance de un CPU completo
durante un minuto
• Una instancia gana créditos de
CPU a un ritmo constante
• Una instancia consume créditos
cuando está activa
• Créditos expiran (leak) después de
24 horas.
Baseline Rate
Credit
Balance
Burst
Rate

Tip: Como Interpretar Steal Time
• Asignaciones de CPU fijas puede ser ofrecidas con
limites establecidos en la CPU
• Steal time ocurre cuando el límite de tiempo en la CPU a
sido agotado
• Revisen las métricas de CloudWatch

Virtualización de I/O y Dispositivos
• Split Driver Model
• Cada dispositivo tiene dos componentes;
• Ring buffer de comunicación
• Canal de eventos avisando el ring buffer de actividad.
• Intel VT-d
• Paso directo para dispositivos dedicados
• Enhanced Networking (SR-IOV)

Split Driver Model : Red
Hardware
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
Frontend
driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application

Paso Directo al Dispositivo: Enhanced Networking
• SR-IOV elimina la necesidad del driver domain
• El dispositivo físico de red expone una función virtual a
la instancia
• Requiere un driver especial:
• El sistema operativo de la instancia necesita saber sobre el
driver
• Es necesario habilitar ”Enhanced Networking” en EC2

Paso Directo al Dispositivo: Enhanced Networking
Hardware
Driver Domain Guest Domain Guest Domain
VMM
Frontend
driver
NIC
Driver
Backend
driver
Device
Driver
Physical
CPU
Physical
Memory
SR-IOV Network
Device
Virtual CPU
Virtual
Memory
CPU
Scheduling
Sockets
Application

Tip: Usar Enhanced Networking
• Mayor cantidad de paquetes por segundo
• Menor varianza en latencia
• El Sistema Operativo de la instancia debe soportarlo

Instancias I2
• Proveen almacenamiento SSD
• Proveen IOPS a bajo costo
• Optimizadas para alta demanda de I/O aleatorio
Model vCPU Memory
(GiB)
Storage Read IOPS Write IOPS
i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000
i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000
i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000
i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000

Grants en kernels prévio a la versión 3.8.0
• Previo a la versión 3.8.0, se requiere un Mapa de grants
• El Mapa de grants requiere de operaciones costosas debido a flushes de TLB
(Translation Lookaside Buffer)
read(fd, buffer,…)

Cesión en kernels posteriores a la versión 3.8.0
• El Mapa de grants está definido en un pool
• La información es copiada o extraída del pool
Copy to
and from
grant pool

Tip: Usar kernels posteriores a la versión 3.8.0
• Amazon Linux 13.09 o mayor
• Ubuntu 14.04 o mayor
• RHEL7 o mayor
• Etc.

Resumen
• Usar PV-HVM
• Monitorar creditos T2
• Usar Enhanced Networking
• Usar kernels posteriores a la versión 3.8.0

Contenu connexe

Tendances

Servicios de Storage en AWSAmazon Web Services LATAM

Creando su primera aplicación Big Data en AWSAmazon Web Services LATAM

AWS Summit Bogotá Track Avanzado: Sin servidores: Mobile backend como servici...Amazon Web Services

AWS Summit Bogotá Track Básico: Bases de datos en AWSAmazon Web Services

AWS Summit Lima 2015: VIrtual Private Cloud y opciones de conectividad con Le...Amazon Web Services LATAM

EC2 Cómputo en la nube a profundidad Amazon Web Services LATAM

Como reducir costos en AWSAmazon Web Services LATAM

Comenzando con la nube híbridaAmazon Web Services LATAM

Comenzando con Aplicaciones Enterprise en AWSAmazon Web Services LATAM

AWS Summit Bogotá Track Básico: Arquitectura para alta disponibilidad en AWSAmazon Web Services

DevOps en AWSAmazon Web Services LATAM

EC2: Cómputo en la nube a profundidadAmazon Web Services LATAM

Comenzando con los servicios móviles en AWSAmazon Web Services LATAM

AWS Services OverviewAmazon Web Services LATAM

Escalando para sus primeros 10 millones de usuariosAmazon Web Services LATAM

Analizando el TCO para migrar a AWSAmazon Web Services LATAM

Viaje a través de la nube - ¿Qué es AWS?Amazon Web Services

EC2: Cómputo en la nube a profundidadAmazon Web Services LATAM

Construyendo aplicaciones para IoT con AWSAmazon Web Services LATAM

AWS Webcast - Viaje a través de la nube : la mejor manera de iniciarse en la ...Amazon Web Services

Tendances (20)

Servicios de Storage en AWS

Creando su primera aplicación Big Data en AWS

AWS Summit Bogotá Track Avanzado: Sin servidores: Mobile backend como servici...

AWS Summit Bogotá Track Básico: Bases de datos en AWS

AWS Summit Lima 2015: VIrtual Private Cloud y opciones de conectividad con Le...

EC2 Cómputo en la nube a profundidad

Como reducir costos en AWS

Comenzando con la nube híbrida

Comenzando con Aplicaciones Enterprise en AWS

AWS Summit Bogotá Track Básico: Arquitectura para alta disponibilidad en AWS

DevOps en AWS

EC2: Cómputo en la nube a profundidad

Comenzando con los servicios móviles en AWS

AWS Services Overview

Escalando para sus primeros 10 millones de usuarios

Analizando el TCO para migrar a AWS

Viaje a través de la nube - ¿Qué es AWS?

EC2: Cómputo en la nube a profundidad

Construyendo aplicaciones para IoT con AWS

AWS Webcast - Viaje a través de la nube : la mejor manera de iniciarse en la ...

En vedette

Servicios de Bases de Datos administradas en AWS Amazon Web Services LATAM

Construcción y operación de centrales hidroeléctricas generadorasDaniela Muñoz

Creación acceso directo gui 6Joana Borda

Presentación alfombra mágica SandraErgasH

El blogangie1gomezo1

Deber en clase de oportunidades de negocios en americaPatricio Suarez

Sql plus oracleD pX Echeverria

Proyecto de profundizaciónpaulatri

[T.A.V.] Equipo supermanInstituto Industrial Luis A. Huergo

Arquitectura para 3ºesomapisca

el agua fajardo 6-1jefo1428fajardo

Tecnología educativasusyk1000

Camilo echeverry 1gdhereherhe

Brief seminario internacional cierre de ventas, julio 2014Liderazgo Eventos

Taller doc 1 finalJoana Borda

Adriana gómez 11ºbAdriana Gomez

Folder calculo 1 pdFrank Molina Resabala

Taller 4to construccionesInstituto Industrial Luis A. Huergo

Diapos tekcnocolor 1Karla Maldonado Blas

Proyecto de profundizaciónLuisFeParraM

En vedette (20)

Servicios de Bases de Datos administradas en AWS

Construcción y operación de centrales hidroeléctricas generadoras

Creación acceso directo gui 6

Presentación alfombra mágica

El blog

Deber en clase de oportunidades de negocios en america

Sql plus oracle

Proyecto de profundización

[T.A.V.] Equipo superman

Arquitectura para 3ºeso

el agua fajardo 6-1

Tecnología educativa

Camilo echeverry 1

Brief seminario internacional cierre de ventas, julio 2014

Taller doc 1 final

Adriana gómez 11ºb

Folder calculo 1 pd

Taller 4to construcciones

Diapos tekcnocolor 1

Proyecto de profundización

Similaire à EC2: Cómputo en la nube a profundidad

AWS Summits América Latina 2015- EC2 Computo en la nubeAmazon Web Services LATAM

AWS Summit Bogotá Track Básico: EC2 & Servicios de Computación. Amazon Web Services

AWSome Day - Conferencia Online Junio 2020 Amazon Web Services LATAM

Clase Maestra EC2Amazon Web Services LATAM

Webinar - Oracle cloud infrastructure, la nueva nube para las cargas empresar...avanttic Consultoría Tecnológica

AWS en EspañolJuan Carlos Perez Amin

Manteniendo sus costos de infraestructura bajosAmazon Web Services LATAM

The azure platform TechDay2010Juan Pablo

AWS Summits América Latina 2015- Bases de Datos en AWSAmazon Web Services LATAM

JIRA data center (AWS)Carlos Raúl Aparicio Hernández

EC2 AvanzadoAmazon Web Services

AWS Summits América Latina 2015 EC2 AvanzadoAmazon Web Services LATAM

Introduccion a Elastic Beanstalk AWS Roadshow Bogota MexicoHermann Pais

Diseño de centros de computo multi sitio con vmware NSX - vforum 2014Wetcom

Instancias Amazon EC2 a profundidadAmazon Web Services LATAM

Introduccion a elastic beanstalk aws roadshow bogota mexicoAmazon Web Services LATAM

SQL como un servicio en la nubeSpanishPASSVC

ConsolidacionGerardo Puerta

5 Consejos Tecnologicos - VMWarePedro Espinosa

Similaire à EC2: Cómputo en la nube a profundidad (20)

AWS Summits América Latina 2015- EC2 Computo en la nube

AWS Summit Bogotá Track Básico: EC2 & Servicios de Computación.

AWSome Day - Conferencia Online Junio 2020

Clase Maestra EC2

Webinar - Oracle cloud infrastructure, la nueva nube para las cargas empresar...

AWS en Español

Manteniendo sus costos de infraestructura bajos

The azure platform TechDay2010

AWS Summits América Latina 2015- Bases de Datos en AWS

JIRA data center (AWS)

EC2 Avanzado

AWS Summits América Latina 2015 EC2 Avanzado

Introduccion a Elastic Beanstalk AWS Roadshow Bogota Mexico

Diseño de centros de computo multi sitio con vmware NSX - vforum 2014

Instancias Amazon EC2 a profundidad

Introduccion a elastic beanstalk aws roadshow bogota mexico

SQL como un servicio en la nube

Consolidacion

5 Consejos Tecnologicos - VMWare

Plus de Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM

AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM

AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM

AWS para terceiro setor - Sessão 1 - Introdução à nuvemAmazon Web Services LATAM

AWS para terceiro setor - Sessão 2 - Armazenamento e BackupAmazon Web Services LATAM

AWS para terceiro setor - Sessão 3 - Protegendo seus dados.Amazon Web Services LATAM

Automatice el proceso de entrega con CI/CD en AWSAmazon Web Services LATAM

Automatize seu processo de entrega de software com CI/CD na AWSAmazon Web Services LATAM

Cómo empezar con Amazon EKSAmazon Web Services LATAM

Como começar com Amazon EKSAmazon Web Services LATAM

Ransomware: como recuperar os seus dados na nuvem AWSAmazon Web Services LATAM

Ransomware: cómo recuperar sus datos en la nube de AWSAmazon Web Services LATAM

Ransomware: Estratégias de MitigaçãoAmazon Web Services LATAM

Ransomware: Estratégias de MitigaciónAmazon Web Services LATAM

Aprenda a migrar y transferir datos al usar la nube de AWSAmazon Web Services LATAM

Aprenda como migrar e transferir dados ao utilizar a nuvem da AWSAmazon Web Services LATAM

Cómo mover a un almacenamiento de archivos administradosAmazon Web Services LATAM

Simplifique su BI con AWSAmazon Web Services LATAM

Simplifique o seu BI com a AWSAmazon Web Services LATAM

Os benefícios de migrar seus workloads de Big Data para a AWSAmazon Web Services LATAM

Plus de Amazon Web Services LATAM (20)

AWS para terceiro setor - Sessão 1 - Introdução à nuvem

AWS para terceiro setor - Sessão 2 - Armazenamento e Backup

AWS para terceiro setor - Sessão 3 - Protegendo seus dados.

AWS para terceiro setor - Sessão 1 - Introdução à nuvem

AWS para terceiro setor - Sessão 2 - Armazenamento e Backup

AWS para terceiro setor - Sessão 3 - Protegendo seus dados.

Automatice el proceso de entrega con CI/CD en AWS

Automatize seu processo de entrega de software com CI/CD na AWS

Cómo empezar con Amazon EKS

Como começar com Amazon EKS

Ransomware: como recuperar os seus dados na nuvem AWS

Ransomware: cómo recuperar sus datos en la nube de AWS

Ransomware: Estratégias de Mitigação

Ransomware: Estratégias de Mitigación

Aprenda a migrar y transferir datos al usar la nube de AWS

Aprenda como migrar e transferir dados ao utilizar a nuvem da AWS

Cómo mover a un almacenamiento de archivos administrados

Simplifique su BI con AWS

Simplifique o seu BI com a AWS

Os benefícios de migrar seus workloads de Big Data para a AWS

Dernier

How to use Redis with MuleSoft. A quick start presentation.FlorenciaCattelani

Avances tecnológicos del siglo XXI 10-07 eyvanamcerpam

EPA-pdf resultado da prova presencial UninoveFagnerLisboa3

Modulo-Mini Cargador.................pdfAnnimoUno1

Innovaciones tecnologicas en el siglo 21mariacbr99

pruebas unitarias unitarias en java con JUNITMaricarmen Sánchez Ruiz

PROYECTO FINAL. Tutorial para publicar en SlideShare.pptxAlan779941

EL CICLO PRÁCTICO DE UN MOTOR DE CUATRO TIEMPOS.pptxMiguelAtencio10

Refrigerador_Inverter_Samsung_Curso_y_Manual_de_Servicio_Español.pdfvladimiroflores1

Resistencia extrema al cobre por un consorcio bacteriano conformado por Sulfo...JohnRamos830530

Avances tecnológicos del siglo XXI y ejemplos de estossgonzalezp1

Dernier (11)

How to use Redis with MuleSoft. A quick start presentation.

Avances tecnológicos del siglo XXI 10-07 eyvana

EPA-pdf resultado da prova presencial Uninove

Modulo-Mini Cargador.................pdf

Innovaciones tecnologicas en el siglo 21

pruebas unitarias unitarias en java con JUNIT

PROYECTO FINAL. Tutorial para publicar en SlideShare.pptx

EL CICLO PRÁCTICO DE UN MOTOR DE CUATRO TIEMPOS.pptx

Refrigerador_Inverter_Samsung_Curso_y_Manual_de_Servicio_Español.pdf

Resistencia extrema al cobre por un consorcio bacteriano conformado por Sulfo...

Avances tecnológicos del siglo XXI y ejemplos de estos

EC2: Cómputo en la nube a profundidad

1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roberto Fuente, Technical Account Manager, AWS Support Damian Traverso, Solutions Architect AWS LATAM Abril 28, 2016 EC2 Cómputo en la nube a profundidad

2. Qué esperar de esta sesión ? • Introducción a AWS y EC2 • Definir el desempeño de un sistema y cómo se caracteriza para diferentes cargas de trabajo • Cómo las instancias EC2 ofrecen un óptimo desempeño, manteniendo flexibilidad y agilidad • Cómo aprovechar de mejor manera el uso de las instancias EC2

3. Introducción a AWS y EC2

4. Infraestructura Global de AWS Region Edge Location 12 Regions 33 Availability Zones 54 Edge Locations

5. US West (OR) AZ A AZ B AZ C GovCloud (US) AZ A AZ B US West (CA) AZ A AZ B AZ C US East (VA) AZ A AZ B AZ C AZ D AZ E *A limited preview of the China (Beijing) Region is available to a select group of China-based and multinational companies with customers in China. These customers are required to create a AWS Account, with a set of credentials that are distinct and separate from other global AWS Accounts. EU (Ireland) AZ A AZ B AZ C AZ A AZ B S. America (Sao Paulo) Asia Pacific (Tokyo) AZ A AZ B AZ C AZ A AZ B Asia Pacific (Singapore) China (Bejing)Asia Pacific (Sydney) AZ A AZ B EU (Frankfurt) AZ A AZ B AWS Regions China (Beijing)* AZ A AZ B Regiones de AWS y Zonas de Disponibilidad (AZs)

6. Qué es Elastic Cloud Compute (EC2)?

7. Amazon Elastic Cloud Compute (EC2) Servidores Virtuales en la nube de AWS Rápida y fácil escalabilidad, según lo necesite Pague únicamente por lo que usa Sistemas Operativos ya conocidos: Linux y Windows

8. Amplia variedad de Tipos de Instancias M4 General purpose Compute optimized C4 C3 Storage and IO optimized I2 G2 GPU enabled Memory optimized R3D2 M3

9. Amazon EC2 permite… • Construir fácilmente aplicaciones con HA • Distribuir la carga entre servidores EC2 usando AWS Elastic Load Balancers • Garantizar alta disponibilidad y escalabilidad usando Auto Scaling • Usar múltiples Zonas de Disponibilidad (AZs) • Elegir entre diferentes modelos comerciales

10. Diferentes modelos comerciales Instancias Reservadas Pague un adelanto inicial mínimo Reserve la capacidad Asegure una tarifa menor por hora Instancias On-Demand Pague de acuerdo con el uso Tarifa plana por hora Sin contratos ni compromisos Instancias Spot Haga una oferta Economice hasta 90% en comparación con On-Demand Lance 1,000s de instancias 10:00 10:05 10:10

11. Definiendo el desempeño

12. Selecionando un servidor • Los servidores son reservados para realizar trabajos • El desempeño se mide de manera diferente dependiendo del trabajo que se realice

13. • Lo que desempeño significa, depende de la perspectiva: • Tiempo de respuesta • Rendimiento • Consistencia Desempeño = perspectiva Aplicación Librerías de Sistema Llamadas a sistema Kernel Dispositivo Carga

14. Factores de desempeño Recurso Factores Indicadores CPU Sockets, número de núcleos, frecuencia de reloj, capacidad Utilización de CPU, tamaño de la fila de ejecución Memoria Capacidad Memoria libre, paginación, swapping Interfaz de Red Ancho de Banda Máximo, paquetes Cantidad paquetes recibidos, transferencia de paquetes sobre el máximo ancho de banda Disco IOPS, Desempeño Tamaño de fila en espera, utilización de dispositivos, errores en los dispositivos

15. Utilización de Recursos • Cada applicacion tienen una perfile de utilizacion de recrusos, para un dado nivel de despemeño. • Un recurso con utilización del 100% no puede recibir o atender más peticiones • Baja utilización indica que se han reservado más recursos de los necesarios

16. Ejemplo: Aplicación Web • MediaWiki instalado en un servidor Apache con 140 páginas de contenido • Incremento de carga en intervalos de tiempo

17. Ejemplo: Aplicación Web • Estadísticas de Memoria

18. Ejemplo: Aplicación Web • Estadísticas de Disco

19. Ejemplo: Aplicación Web • Estadísticas de Red

20. Ejemplo: Aplicación Web • Estadísticas de CPU

21. Selección de instancia = optimización • La selección de una instancia es equivalente a la optimización de los recursos • Dar de baja instancias es tan fácil como adquirir nuevas • Alinear el tipo de carga con el tipo de instancia óptimo

22. Ofreciendo desempeño de cómputo en EC2

23. Instrucciones de CPU y Niveles de Protección • CPU tiene dos niveles de protección: Kernel y Aplicación • Instrucciones privilegiadas no se pueden ejecutar en modo usuario para proteger el sistema. • Aplicaciones apalancan las llamadas al sistema al kernel Instrucciones privilegiadas: • Inicio de I/O • Acceso a I/O de Dispositivos (red, disco) • Manejo del tiempo • Pausa CPU Aplicación Kernel

24. Ejemplo: Llamadas al sistema de una aplicación web [ec2-user@ip-10-0-121-0 ~]$ sudo strace -c -p 2440 Process 2440 attached ^CProcess 2440 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 0.00 0.000000 0 931 11 read 0.00 0.000000 0 887 write 0.00 0.000000 0 121 open 0.00 0.000000 0 154 close 0.00 0.000000 0 1357 32 stat 0.00 0.000000 0 341 fstat 0.00 0.000000 0 99 11 lstat 0.00 0.000000 0 865 poll 0.00 0.000000 0 121 mmap 0.00 0.000000 0 121 munmap 0.00 0.000000 0 220 brk 0.00 0.000000 0 11 rt_sigaction 0.00 0.000000 0 11 rt_sigprocmask 0.00 0.000000 0 22 writev 0.00 0.000000 0 66 22 access

25. X86 CPU Virtualización : Antes de Intel VT-x • Traducción a binario para instrucciones privilegiadas • Para-virtualization (PV) • PV requiere pasar por VMM, introduciendo latencia • Aplicaciones que son ligados/bound a llamadas de sistemas son más afectadas VMM Application Kernel PV

26. 27 Aplicando la ley de Moore 90 nm 2003 180 nm 1999 130 nm 2001 65 nm 2005 45 nm 2007 32 nm 2009 22 nm 2012 14 nm 2014 LEY DE MOORE Habilitando nuevos dispositivos con mayor funcionalidad y complejidad, mientras se controla la potencia, el costo y el tamaño. (duplicando la integración cada 2 años)

27. 28 Intel® Core™ Microarchitecture TOCK New Micro- architecture Merom 65nm TICK Penryn New Process Technology 45nm Nehalem Microarchitecture TOCK New Micro- architecture Nehalem Xeon 5500 45nm TICK Westmere Xeon 5600 32nm New Process Technology Sandy Bridge Microarchitecture TOCK Sandy Bridge Xeon E5 32nm New Micro- architecture TICK Ivy Bridge Xeon E5 v2 22nm New Process Technology Haswell Microarchitecture TOCK Haswell Xeon E5 v3 22nm New Micro- architecture TICK Broadwell Xeon E5 v4 14nm New Process Technology Modelo Tick Tock – Evolución de plataformas Xeon

28. RED Datos en movimiento ALMACENAMIENTO Datos estacionarios COMPUTO Datos siendo transformados Una arquitectura común para todo el Datacenter Brindando economias de escala a toda la infraestructura

29. X86 CPU Virtualización : Despues de Intel VT-x • Virtualización asistida por hardware (HVM) • PV-HVM utiliza PV drivers para operaciones que son lentas a ser emuladas. : • e.g. Red y I/O de disco Kernel Application VMM PV-HVM

30. Tip: Usar AMIs PV-HVM con EBS

31. Instancias C4 Custom Intel E5-2666 v3 at 2.9 GHz Gestión de P-state C-state Model vCPU Memory (GiB) EBS (Mbps) c4.large 2 3.75 500 c4.xlarge 4 7.5 750 c4.2xlarge 8 15 1,000 c4.4xlarge 16 30 2,000 c4.8xlarge 36 60 4,000

32. Instancias: T2 • Menor costo de instancias • Burstable performance • Asignación fija de créditos CPU Model vCPU CPU Credits / Hour Memory (GiB) Storage t2.nano 1 3 0.5 EBS Only t2.micro 1 6 1 EBS Only t2.small 1 12 2 EBS Only t2.medium 2 24 4 EBS Only t2.large 2 36 8 EBS Only

33. How Credits Work • Un crédito de CPU proporciona la performance de un CPU completo durante un minuto • Una instancia gana créditos de CPU a un ritmo constante • Una instancia consume créditos cuando está activa • Créditos expiran (leak) después de 24 horas. Baseline Rate Credit Balance Burst Rate

34. Tip: Supervisar el crédito de CPU

35. Tip: Como Interpretar Steal Time • Asignaciones de CPU fijas puede ser ofrecidas con limites establecidos en la CPU • Steal time ocurre cuando el límite de tiempo en la CPU a sido agotado • Revisen las métricas de CloudWatch

36. Ofreciendo desempeño de I/O en EC2

37. Virtualización de I/O y Dispositivos • Split Driver Model • Cada dispositivo tiene dos componentes; • Ring buffer de comunicación • Canal de eventos avisando el ring buffer de actividad. • Intel VT-d • Paso directo para dispositivos dedicados • Enhanced Networking (SR-IOV)

38. Split Driver Model : Red Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

39. Split Driver Model : Red Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

40. Split Driver Model : Red Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

41. Split Driver Model : Red Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

42. Split Driver Model : Red Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver Frontend driver Backend driver Device Driver Physical CPU Physical Memory Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

43. Paso Directo al Dispositivo: Enhanced Networking • SR-IOV elimina la necesidad del driver domain • El dispositivo físico de red expone una función virtual a la instancia • Requiere un driver especial: • El sistema operativo de la instancia necesita saber sobre el driver • Es necesario habilitar ”Enhanced Networking” en EC2

44. Paso Directo al Dispositivo: Enhanced Networking Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

45. Paso Directo al Dispositivo: Enhanced Networking Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

46. Paso Directo al Dispositivo: Enhanced Networking Hardware Driver Domain Guest Domain Guest Domain VMM Frontend driver NIC Driver Backend driver Device Driver Physical CPU Physical Memory SR-IOV Network Device Virtual CPU Virtual Memory CPU Scheduling Sockets Application

47. Tip: Usar Enhanced Networking • Mayor cantidad de paquetes por segundo • Menor varianza en latencia • El Sistema Operativo de la instancia debe soportarlo

48. Revisión de Instancias I2

49. Instancias I2 • Proveen almacenamiento SSD • Proveen IOPS a bajo costo • Optimizadas para alta demanda de I/O aleatorio Model vCPU Memory (GiB) Storage Read IOPS Write IOPS i2.xlarge 4 30.5 1 x 800 SSD 35,000 35,000 i2.2xlarge 8 61 2 x 800 SSD 75,000 75,000 i2.4xlarge 16 122 4 x 800 SSD 175,000 155,000 i2.8xlarge 32 244 8 x 800 SSD 365,000 315,000

50. Grants en kernels prévio a la versión 3.8.0 • Previo a la versión 3.8.0, se requiere un Mapa de grants • El Mapa de grants requiere de operaciones costosas debido a flushes de TLB (Translation Lookaside Buffer) read(fd, buffer,…)

51. Cesión en kernels posteriores a la versión 3.8.0 • El Mapa de grants está definido en un pool • La información es copiada o extraída del pool Copy to and from grant pool

52. Tip: Usar kernels posteriores a la versión 3.8.0 • Amazon Linux 13.09 o mayor • Ubuntu 14.04 o mayor • RHEL7 o mayor • Etc.

53. Resumen • Usar PV-HVM • Monitorar creditos T2 • Usar Enhanced Networking • Usar kernels posteriores a la versión 3.8.0

54. Gracias!

Notes de l'éditeur

Grna plazer de estar en el primer simmit de buenos aires. Support. I help my customer deep dive performance issues on EC2 and AWS services.
This session is designed to be educational and consultative. I want you all to come take away something that can help you use EC2, starting with how you can define performance down to features and tips you can use to get more performance and how they work You all have specific things you care about and objectives, but if those aren’t covered in the talk, don’t be too disappointed. We’ve brought some great engineers into our booth to answer your questions after the session. La parte mas importante es como aprovechar de mejor manero las instancias.
Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios.
Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world.
You can choose to deploy and run your applications in multiple physical locations within the AWS cloud. Our data center footprint is global, spanning 5 continents with highly redundant clusters of data centers in each region. Amazon Web Services are available in geographic Regions that are independent and separate as much as possible for data sovertenty and as much as possible offer the same services. When you use AWS, you can specify the Region in which your data will be stored, instances run, queues started, and databases instantiated. Within each Region are Availability Zones (AZs). Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, you can protect your applications from a failure (unlikely as it might be) that affects an entire zone. Regions consist of one or more Availability Zones, are geographically dispersed, and are in separate geographic areas or countries. The Amazon EC2 service level agreement commitment is 99.95% availability for each Amazon EC2 Region. Our footprint is expanding continuously as we increase capacity, redundancy and add locations to meet the needs of our customers around the world. AWS maintains Regions, which are major geographic areas, and Availability Zones (AZ), which are individual data centers, or clusters of data centers that make up a Region. Independent and separate that as much as possible offer the same services. But they have isolation as much as possible for data sovertenty. Today, AWS operates 9 Regions around the world. Each Region has a minimum of 2 Azs (separate power, flood planes, etc) to allow customers to set up high availability architectures and data redundancy. An abstraction of a datacenter with fault isolation but close enough to build high availability architectures. In addition to Regions, AWS maintains edge locations that supporting Route 53 DNS and Amazon CloudFront (CDN) points of presence.
Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios.
EC2 is designed to make web scale computing easier for developers. It has resizable compute capacity, configurable security and network access, and you have complete control of the resource. Resources can be started, terminated and monitored as needed, and you can increase availability by deploying instances across multiple physical locations.
Talk about instance families and really stress the breadth of our offering. This graphic does not speak to all the Instance Types but it does allow you to begin the conversation on the different types of families and the purpose we had in mind when AWS created the different Instance Types.
Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs. Speaker Note: Describe ELB and auto scaling, the key use cases and how they can be interdependent but not necessarily. ELB Benefits: HA, health checks, SSL offloading, stickiness sessions, logging, etc….. Detects health of amazon ec2 instances to ensure, detect, and remove failing instances Dynamically grows and shrinks resources based on traffic Seamlessly integrates with autoscaling to add and remove instances based on scaling activities Supports load balancing of applications using HTTP, HTTPS, SSL, and TCP protocols. Auto Scaling: Automatically scale your Amazon ec2 fleet to optimize utilization based on your conditions and needs scale customer's ec2 capacity automatically and shed unneeded Amazon EC2 instances automatically Good for apps that experience variability in usage Is enabled by Amazon CloudWatch and carries no additional fees
Explain how pricing works, integrate our new SPOT model with 1-6 hours. Great slide to just talk and whiteboard out how our offerings could be bought in a hybrid model. Some Spot, some Demand, and some RI.
In order to know how to improve performance you have to first know what it is and how to measure it. Performance can mean different things depending on what you’re talking about.
Servers are hired to do jobs, and what those are jobs depend on your business or personal objectives Defining performance for your application is the first step to knowing what you need out of your virtual machines on EC2 Skipping that step can lead to overprovisioning or under provisioning, spending too much, or not spending enough and not meeting your customer promise Because EC2 offers lots of virtual server configurations on-demand, pay by the hour, the approach to right sizing is different and less stressful You aren’t stuck with it, and you can experiment easily The goal is to hire the right server for the job CPU bound, IO intensive etc.
The ways that you can generalize performance are the following: How quickly does a unit of work get done, or response time How much work is being done per unit of time, or throughput And how consistently over time is a level of performance achieved. Consistency can be very important. How quickly does a unit of work get done When you execute the database query, how quickly does it come back When you enter a website how quickly does the page load How much work are you getting done in a unit of time Web application: the number of requests per second handled within a tolerable response time Database: transactions per second Transcoding video: frames per second Machine learning: inferences per second, or number of training jobs per unit of time Going further down the stack, to the filesystem for example, you might look at filesystem cache hits. Down to the hardware resources that do the work, you’re paying attention to CPU, Memory, Disk, Network, and whether these resources are fully saturated or utilized. For instances, we think about performance at the resource level – the capabilities of though resources and how they are utilized
Inidcadores de Performance son indicadores de los recrusos para ver si todo el potential esta siendo utilizado o no. Explain CPU, and stuff. Cual la utilizacion correcta? La demando de cada componente depende de que tipo de applicacion, recordard es una applciacion que usa mucha CPU, Disco etc.
Each application can have a different resource utilization profile for a given level of workload performance. Utilization: 100% utilization is usually a sign of a bottleneck. Si tenemos baja capacidad de recursos. Performance over Application on ec2.
As an illustration we set up a simple mediawiki deployment – a PHP application using apache and mysql. We set it up with 140 pages of content and ran a load test We used siege and gradually upped the load over time On the server side, we collected some basic metrics using collectd and used a graphing tool to pull some charts together. The default interval of 10 seconds was used, so you get pretty good granularity It plugs into Apache, and here we show the apache requests per second rate from the web server status output Why not using cloudwatch metrics/ hyper visor only shows certain metrics. Lets have looks //are we going to show any sings of capacity or we just going to go over metrics.
Buffers for Filesystem metadata Cache for file cache to reduce disk accesses No swapping The information displayed in the memory section provides the same data about memory usage as the command free -m. The swapd or “swapped” column reports how much memory has been swapped out to a swap file or disk. The free column reports the amount of unallocated memory. The buff or “buffers” column reports the amount of allocated memory in use. The cache column reports the amount of allocated memory that could be swapped to disk or unallocated if the resources are needed for another task. The swap section reports the rate that memory is sent to or retrieved from the swap system. By reporting “swapping” separately from total disk activity, vmstat allows you to determine how much disk activity is related to the swap system. The si column reports the amount of memory that is moved from swap to “real” memory per second. The so column reports the amount of memory that is moved to swap from “real” memory per second.
I/O The io section reports the amount of input and output activity per second in terms of blocks read and blocks written. The bi column reports the number of blocks received, or “blocks in”, from a disk per second. Thebo column reports the number of blocks sent, or “blocks out”, to a disk per second. r/s & w/s: Read and write requests per second. This is already post-merging, and in proper I/O setups reads will mean blocking random read (serial reads are quite often merged), and writes will mean non-blocking random write (as underlying cache can allow to serve the OS instantly). rrqm/s & wrqm/s: How many requests were merged by block layer. In ideal world, there should be no merges at I/O level, because applications would have done it ages ago. Ideals differ though, for others it is good to have kernel doing this job, so they don’t have to do it inside application.
Buffers for Filesystem metadata Cache for file cache to reduce disk accesses No swapping The information displayed in the memory section provides the same data about memory usage as the command free -m. The swapd or “swapped” column reports how much memory has been swapped out to a swap file or disk. The free column reports the amount of unallocated memory. The buff or “buffers” column reports the amount of allocated memory in use. The cache column reports the amount of allocated memory that could be swapped to disk or unallocated if the resources are needed for another task. The swap section reports the rate that memory is sent to or retrieved from the swap system. By reporting “swapping” separately from total disk activity, vmstat allows you to determine how much disk activity is related to the swap system. The si column reports the amount of memory that is moved from swap to “real” memory per second. The so column reports the amount of memory that is moved to swap from “real” memory per second.
The cpu section reports on the use of the system’s CPU resources. The columns in this section always add to 100 and reflect “percentage of available time”. The us column reports the amount of time that the processor spends on userland tasks, or all non-kernel processes. The sy column reports the amount of time that the processor spends on kernel related tasks. The id column reports the amount of time that the processor spends idle. The wa column reports the amount of time that the processor spends waiting for IO operations to complete before being able to continue processing tasks.
Lo bueno es que pueden ellimanr las instancias un fez terminandas – o lanzar las pruebas en otras.
Protection, system call performance, Scheduling and P and C state management Tips: HVM, which system calls to use for timekeeping, how to manage P and C states
CPU has at least two protection levels: Kernel mode and user mode CPU checks current protection level on each instruction Privileged instructions can’t be executed in user mode to protect system. They include: “Initiate I/O” Access I/O devices, such as network and disk “Access protected memory” Manipulate memory management unit Time keeping Halt CPU or chance power state Done in user mode software through system calls – trap to kernel mode.
Took a sample of the system calls being done by httpd and here’s the list of the most frequently used Creating processes Input / output operations (file system operations) And mapping files and devices into memory These are generally some of the most popular system calls. If you have debugging enabled, for example, you’ll see an elevation in the number of gettimeofday() calls to put time stamps in the debug logs. Most time related php functions will use the system time. Since they use the system time, gettimeofday will be called a lot so if you want to reducte the calls, reducte your time related functions. If your application does a lot of I/O or you want to use debugging mode with lots of time checks, for example, you would start to care more about your system call performance.
When virtualizing hardware it’s job of the hypervisor to enforce protections and schedule resources to offer a controlled virtual machine experience Else, OS and user land share the same ring The hypervisor must be able to trap and moderate any instruction that changes the hardware or state of the system. This is to provide isolation between virtual machines. Hypervisor is moderating system calls and sending it back to the OS. System call performance is poor. So you have a couple options So you can scan the instruction stream of each virtual machine for privileged instructions and do binary translation – performance is not ideal Ignore those instructions and provide “hypercalls” to replace instructions that lose their functionality – modified OS Kernel, compatibility and portability Use hardware assisted virtualization technology provide a new CPU execution mode feature that allows the hypervisor to run in a new root mode below ring 0. Then there are complex devices that need to get emulated The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management, interrupt handling and time keeping. When virtualizing the CPU, one also has a choices of how to assign physical CPU cores to virtual CPUs.
The hypervisor must be able to trap and moderate any instruction that changes the hardware or state of the system. This is to provide isolation between virtual machines. Use hardware assisted virtualization technology provide a new CPU execution mode feature that allows the hypervisor to run in a new root mode below ring 0. Then there are complex devices that need to get emulated The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management, interrupt handling and time keeping. When virtualizing the CPU, one also has a choices of how to assign physical CPU cores to virtual CPUs. But fully virtualized mode, even with PV drivers, has a number of things that are unnecessarily inefficient. One example is the interrupt controllers: fully virtualized mode provides the guest kernel with emulated interrupt controllers (APICs and IOAPICs). Each instruction that interacts with the APIC requires a trip up into Xen and a software instruction decode; and each interrupt delivered requires several of these emulations. With the introduction of PVHVM mode, we can start to see paravirtualization not as binary on or off, but as a spectrum. In PVHVM mode, the disk and network are paravirtualized, as are interrupts and timers. But the guest still boots with an emulated motherboard, PCI bus, and so on. It also goes through a legacy boot, starting with a BIOS and then booting into 16-bit mode. Privileged instructions are virtualized using the HVM extensions, and pagetables are fully virtualized, using either shadow pagetables, or the hardware assisted paging (HAP) available on more recent AMD and Intel processors. The "HVM callback vector" line shows that PV interrupts are enabled (from PVHVM), which is a big difference. On full HVM mode, emulated PCI interrupts are used for device I/O delivery, along with emulating the PCI bus, local APIC, and IO APIC. If you doing a high rate of disk I/O or network packets – which is easy to do on today's networks – these emulation overheads add up. With vector callbacks instead of interrupts, the Xen hypervisor can call the destination guest driver directly, avoiding these overheads.
A fully virtualized system, like an OS running on bare hardware, relies on the timer interrupt for its time keeping. This means a number of things: An idle virtual machine still has to process hundreds of interrupts a second. Missed interrupts result in unstable time. On Linux there are two different time mechanisms Clock source and clock events Gettimeofday you are accessing a clock source, same for QueryPerformanceCounter Have commands that let you see your clock source Usually by default it's going to be the xen clock source JVM tracing does very heavy get time of day calls Benchmarks tend to show this problem more than a lot of applications TSC is a hardware clocksource that gets rid of all of the software that has to go on top of tings In linux can access the TSC without talking to the kernel Xen pvclock gives you compatibility with a wide range of hardware
If you want to see the differences, need to use a time keeping benchmark. In real world most often occurs when you are using a JVM and high a high debug level enabled so the JVM does time based tracing. Another classic example is SAP because they do a large about of time keeping operations. High fidelity trace records. Test before changing!
CPU customaizdo para EC2. Overlocking / C-estados. Son super rapdiodos las instancia C4.8XL consigue llegar a 3.5GHZ, en una solo cor. Controles de Estado C y P Estado-C Controla el nivel de reposo al que puede llegar un núcleo Numerados del C0 (el núcleo está trabajando normalmente y ejecutando instrucciones) al C6 (el núcleo está apagado) Estado-P Controla el nivel de desempeño deseado en un núcleo Numerados del P0 (el mayor desempeño en el núcleo en donde tiene permitido usar la tecnología Turbo Boost de Intel que permite incrementar la frecuencia), y luego va del P1 (solicita la máxima frecuencia base) al P15 (solicita la mínima frecuencia posible)
Nivel de int
You might want to change the C-state or P-state settings to increase processor performance consistency, reduce latency, or tune your instance for a specific workload. The default C-state and P-state settings provide maximum performance, which is optimal for most workloads. However, if your application would benefit from reduced latency at the cost of higher single- or dual-core frequencies, or from consistent performance at lower frequencies as opposed to bursty Turbo Boost frequencies, consider experimenting with the C-state or P-state settings that are available to these instances.
In this example, vCPUs 21 and 28 are running at their maximum Turbo Boost frequency because the other cores have entered the C6 sleep state to save power and provide both power and thermal headroom for the working cores. vCPUs 3 and 10 (each sharing a processor core with vCPUs 21 and 28) are in the C1 state, waiting for instruction.
C-states control the sleep levels that a core may enter when it is inactive. You may want to control C-states to tune your system for latency versus performance. Putting cores to sleep takes time, and although a sleeping core allows more headroom for another core to boost to a higher frequency, it takes time for that sleeping core to wake back up and perform work. For example, if a core that is assigned to handle network packet interrupts is asleep, there may be a delay in servicing that interrupt. You can configure the system to not use deeper C-states, which reduces the processor reaction latency, but that in turn also reduces the headroom available to other cores for Turbo Boost. A common scenario for disabling deeper sleep states is a Redis database application, which stores the database in system memory for the fastest possible query response time.
You can reduce the variability of processor frequency with P-states. P-states control the desired performance (in CPU frequency) from a core. Most workloads perform better in P0, which requests Turbo Boost. But you may want to tune your system for consistent performance rather than bursty performance that can happen when Turbo Boost frequencies are enabled. Intel Advanced Vector Extensions (AVX or AVX2) workloads can perform well at lower frequencies, and AVX instructions can use more power. Running the processor at a lower frequency, by disabling Turbo Boost, can reduce the amount of power used and keep the speed more consistent. For more information about optimizing your instance configuration and workload for AVX, seehttp://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf.
You can reduce the variability of processor frequency with P-states. P-states control the desired performance (in CPU frequency) from a core. Most workloads perform better in P0, which requests Turbo Boost. But you may want to tune your system for consistent performance rather than bursty performance that can happen when Turbo Boost frequencies are enabled. Intel Advanced Vector Extensions (AVX or AVX2) workloads can perform well at lower frequencies, and AVX instructions can use more power. Running the processor at a lower frequency, by disabling Turbo Boost, can reduce the amount of power used and keep the speed more consistent. For more information about optimizing your instance configuration and workload for AVX, seehttp://www.intel.com/content/dam/www/public/us/en/documents/white-papers/performance-xeon-e5-v3-advanced-vector-extensions-paper.pdf.
T2.nano . Este tipo de instancia fueron creados porque tenismos lientes que utilizaban poco CPU. Funciona tipo bursting.
A CPU Credit provides the performance of a full CPU core for one minute Hefty initial CPU credit balance for good startup experience Use credits when active, accrue credits when idle Transparency on credit balances
Revisar las metricas de cloudwatch para entender como estan la utilzacion. Hmm vemos Steal Time – me estan robando la CPU?
In order to do time accounting for each process, measures time, schedules process, run a new process, check time again. Charge a process with the difference. Model assumes OS in running 100% of the time, if hypervisor takes away time from the instance and the OS doesn't know it, because it wouldn’t be getting timer interrupts. The category of “steal time” is enabled paravirtual extension where the guest queries the hypervisor for time, and can figure out when time was taken away. Exists in Linux and not WIndows. There's a caveat - when call a hault - if haulted, doesn't get reported as steal time. So steal time doesn't always account for the time the hypervisor has taken from you.
“A common misconception about steal time (due to the unfortunate naming) is that it is a metric for showing the amount of CPU cycles stolen by other virtual machines in the same virtual host. No doubt that cloud service providers tend to oversell but steal time should not be the basis for this assumption. Steal time actually accounts for the cycles the local virtual machine is trying to go over its originally allocated resources. It should actually be named involuntary wait as mentioned in the Linux kernel documentation for /proc/stat.” There are a number of corner cases where hypervisor is doing work on your behalf. It can help you understand what's happening but it doesn't indicate that your performance is worse. The big takeaway is that your performance is not being impacted. The goal of steal time is to correct process accounting.
Protection, system call performance, Scheduling and P and C state management Tips: HVM, which system calls to use for timekeeping, how to manage P and C states
Consistent device drivers provided in a split driver model allows for better portability of machine images across hardware generations. Allows hardware specific drivers to reside in a control operating system, and simple front-end driver in the guest communicates to the back-end through ring buffers in shared memory pages. The multiplexing happens on the host, and it can require host CPU resoruces. The original challenge of assigning a device to a virtual machine has to with direct memory access, so a device can modify memory without bothering the CPU. That would be a serious security hole if allowed in the context of a multi-tenant host. IOMMU can identify source devices and either deny or translate memory requests using IOMMU page tables. This enables the hypervisor to assign specific devices to a guest and restrict device memory access to pages owned by the guest. This is how we enable PCI-passthrough for things like GPU instances and SR-IOV network devices.
Single Root I/O Virtualization Physical network device exposes Virtual Function to instance Driver in your instance is lightweight PCIe function, limited configuration, direct access to physical NIC. Packets no longer processed in software. But it’s a specialized driver, which means: Your instance OS needs to know about it and be using it EC2 needs to be told your instance OS knows about it and can handle it.
In a virtualized system, virtual address points to guest physical address which points to a host physical. You have this for both IO domain and instance. Grant maps two different guest physicals to the same host physical with permissions. Grant always has to originate from the instance. If the request is a write operation, these grants are filled with the desired data to write to the disk and necessary permissions are given to the driver domain, so it can map the grants (either read only if the request is a write operation, or with write permissions if the request is a read operation). Once we have the grants set up, a reference (the grant reference) is added to the request, and the request is finally queued on the shared ring and the driver domain is notified that it has a pending request. When the driver domain reads the request, it parses the grant references on the message and maps the grants on the driver domain memory space. When that is done, the driver domain can access this memory as if it was local memory. The request is then fulfilled by the driver domain, and data is read or written to the grants. After the operation has completed, the grants are unmapped, a response is written to the shared ring, and the guest is notified. Then the guest realizes it has a pending response, it reads it and removes the permissions to share the related grants. After that, the operation is completed. As we can see from the above flow, there is no memory copy, but each request requires the driver domain to perform several mapping and unmapping operations, and each unmapping requires a TLB flush. TLB flushes are expensive operations, and the time required to perform a TLB flush increases with the number of CPUs.
To solve this problem, an extension to the block ring protocol has been added, called “persistent grants“. Persistent grants consist in reusing the same grants for all the block related transactions between the guest and the driver domain, so there’s no need unmap the grants on the driver domain, and TLB flushes are not performed (unless the device is disconnected and all mappings are removed). Furthermore, since grants are done only once, there is no need to grab the driver domain’s grant lock on every transaction. This of course, doesn’t come for free, since grants are no longer mapped and unmapped, data has to be copied from or to the persistently mapped grant. But for large numbers of guests, the overhead from TLB flushes and lock contention greatly outweighs the overhead of copying.

EC2: Cómputo en la nube a profundidad

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à EC2: Cómputo en la nube a profundidad

Similaire à EC2: Cómputo en la nube a profundidad (20)

Plus de Amazon Web Services LATAM

Plus de Amazon Web Services LATAM (20)

Dernier

Dernier (11)

EC2: Cómputo en la nube a profundidad

Notes de l'éditeur