Kentaro Matsumoto, KDDI Corporation, Hyde Sugiyama, Red Hat, Inc
As telecom career, we KDDI have been managing thousands of physical servers and run various kinds of workloads. In our operation of such a huge environment, We are frequently required to shut down our servers for maintenance, but it is not easy to negotiate with our tenant users to allow downtime. To make it easier, we are developing the structure called "Zone Migration", using the framework of OpenStack project "Watcher". "Zone Migration" makes it possible to migrate tenants’ workloads from compute nodes and storage devices we want to maintain (source zone) to new blank ones (destination zone) efficiently, automatically, and with minimum downtime.
These requirements as follows are realized.
-A lot of VMs and volumes should be migrated within a limited time frame
-Operations should be automated, but also can be controlled manually
-Time and load of migration should be under control so that tenants’ systems will not be affected
We are proceeding with the project in cooperation with NEC and Red Hat, and developing this structure on Red Hat OpenStack Platform.
10 Trends Likely to Shape Enterprise Technology in 2024
Software-defined migration how to migrate bunch of v-ms and volumes within a limited time frame
1.
2. Software-Defined Migration:
How to migrate bunch of VMs and Volumes within a limited time frame
Kentaro Matsumoto
KDDI
Takashi Torii
NEC
Hyde Sugiyama
Red Hat
3. 1. Background
2. Concept & Use case
3. PoC environment and result
4. Contribution plan
5. Discussion – further NFV use case
Agenda
4. ● Company Name : KDDI CORPORATION
● Date of Establishment: June 1, 1984
● Main Business: Telecommunications Business
● President: Takashi TANAKA
● Capital: US$ 1.42 B
● Revenue: US$ 44.66 B *
● Operating Income: US$ 8.3 B *
● Total Employees : 28,172 *
*consolidated base
Fortune Global Company in the World
*As of March, 2016
*1USD=100JPY
About KDDI
5. Experience of
global business
60+years
Global Network Coverage
190+countries
Number of
TELEHOUSE Data Centers
45sites
Total Number of
Employees in Japan
21,000+
Total Number of
group companies
183 companies
Head Quarter
Tokyo
Number of offices around the world
107offices
About KDDI
TELEHOUSE
Shanghai, China
Submarine Cable ship
KDDI Pacific Link
TELEHOUSE
New York, U.S.
Global Network Operation
Center, London
KDDI Satellite
Parabolic Antenna
6. 6
Huge Operation & Maintenance cost
Compute Cluster
compute1 compute2 compute3 compute4 compute5
compute6
shared spare
vm1 vm3 vm4 vm5vm2b
vm2a
• Various issues at computes (bios update, security patch for KVM …)
• Maintenance one by one sequentially at midnight
• Hard to negotiate with tenants (owners of VMs)
• Thousands of these clusters
patch
1. Migrate vm1 to spare compute
2. Apply patch
3. Migrate vm1 to original compute
patch
4. Repeat process
Background
7. Concept
7
Migrate all resources of zone at once
Service Zone
OpenStack IaaS Environment
1. Add computes as Maintenance Zone from H/W pool
Maintenance Zone
2. Migrate all resources (VMs/Volumes) at once to Maintenance Zone
H/W pool
3. H/W maintenance of each compute
• Migrate VMs and Volumes on hundreds of physical servers and storages
• Develop this structure based on OpenStack technologies
• “nova livemigration” for running VMs
• “nova migration” for stopped VMs
• “nova volume update” for VM-attached volumes
• “cinder migration” for VM-detached volumes
• “watcher” for migration scheduling
patch
8. Use Case
8
Use Case of KDDI PoC environment
Zone#2: Maintenance ZoneZone#1 : Service Zone#1
compute1 compute2 compute3 compute4 compute5 compute6
vm1
Storage for
Zone#1
Vol#1 (vm1 /vda)Vol#3 (vm1 /vdb)
vm2
Vol#2 (vm2 /vda)Vol#4 (vm2 /vdb)
Vol#5 (detached)
Storage for
Zone#2
Shared
Storage
Zone#1+Zone#2 : Service Zone#1
1. Integrate Zone#2 as Service Zone#1
2. Migrate vms
3. Migrate volumes
Zone#2: Service Zone#1Zone#1 : Maintenance Zone
Precondition
• Don’t use ephemeral disk
• System volume of VMs (/dev/vda) are stored in shared storage
• Additional volume of VMs (/dev/vdb) are stored in zone-dedicated storage
4. Divide zone again
9. 9
PoC Environment and Result
Migration of large amounts of VMs and Volumes
Zone#2: Maintenance ZoneZone#1 : Service Zone#1
compute1 compute2 compute3 compute4 compute5 compute6
Storage for
Zone#1(AFA)
Storage for
Zone#2(AFA)
Shared
Storage (ceph)
■ Test scenario
• Migrate 30 VMs and 30 volumes
• 20MB/sec load in each VM by “stress”
• 500IOPS load in each VM by “vdbench”
RedHat OpenStack Platform 9 (mitaka)
16core/64GB mem
10vm / compute
VM(3 sizes)
System (50GB)
Additional(5GB)
30 additional
volumes
30 system
volumes
Size vCPU MEM #ofVM
S 1 2 12
M 2 4 12
L 4 8 6
10. 10
PoC Environment and Result
Migration of large amounts of VMs and Volumes
Zone#2: Maintenance ZoneZone#1 : Service Zone#1
compute1 compute2 compute3 compute4 compute5 compute6
Storage for
Zone#1(AFA)
Storage for
Zone#2(AFA)
Shared
Storage (ceph)
1. Migrate max 5 volumes in parallel
2. Migrate max 5 VMs attaching migrated volumes
3. Migrates next 5 volumes
4. Migrates next 5 VMs
5. Repeat process
■ Result
• 42 minutes for whole process
• No ping failure to VMs at migration
12. 12
Basic Features will be implemented in Pike
Additional Strategy and Efficacy will be in Queen
Item Blueprint Target Detail
Framework Automatic-Triggering-Audit Done(Ocata) Triggering action plan automatically
Cancel-Action-Plan Pike-2 Add support to cancel execution of Action plan
Suspended-Audit Done(Pike-1) Add suspended audit state for continuous audit
Multi-Data-Source Pike-3 Handle multiple datasources independently from
the strategy
Multi-Global-Efficacy-
Indicator
Queen Supports multiple global efficacy indicator
Data Model Plugin Cinder-Model-Integration Pike-3 Integrate storage (cinder) information in the model
Action Plugin Volume-Migration Pike Implements volume migrate action
Strategy Plugin Volume-Migration-Strategy Queen Implementing migration strategy
Watcher Contribution plan
13. Watcher Block Diagram
13
API
Decision Engine
Strategy
Applier
Action
CinderCLI
DataModel
Data
source
Workflow
Nova
Glance
Ceilometer
Volume Migration
Action
Cinder Model Integration
Volume Migration
Strategy
Auto-Trigger/Suspended/
Cancel
Multi data sourceMulti-Global-Efficacy-Indicator
14. Discussion – NFV use case
14
Service Availability Classification Levels (ETSI GS NFV-REL 001 V1.1.1)
SAL Type Customer Type Service/Function
Level 1 Network Operation Control
Traffic
Government/Regulatory
Emergency Services
Intra-carrier engineering traffic
Emergency telecommunication service (emergency response,
emergency dispatch)
Critical Network Infrastructure Functions (e.g. VoLTE functions
DNS Servers,etc.)
Level 2 Enterprise and/ or large scale
customers (e.g. Corporations,
University)
Real-time traffic (Voice and video)
Network Infrastructure Functions supporting Level 2 services
(e.g. VPN servers, Corporate Web/ Mail servers)
Level 3 General Consumer Public and
ISP Traffic
Data traffic (including voice and video traffic provided by OTT)
Network Infrastructure Functions supporting Level 3 services
Zone migration target use case for planned outage