2. Agenda
1. LINE’s Private Cloud Verda
2. Verda Kubernetes as a Service
3. Verda Event Handler
4. Summary
3. LINE Private Cloud?
● Two Different Product (development, production)
● “IaaS” + “Multiple managed service” (MySQL, Redis…)
● Multiple Region (3 Region)
● 20,000 VMs , 7500 Baremetals in total of 3 Regions
Still Growing
Red : predict
Black : actual
Overview
4. Verda Kubernetes as a Service
1. LINE’s Private Cloud Verda
2. Verda Kubernetes as a Service
3. Verda Event Handler
4. Summary
Verda Kubernetes as a Service → Background
5. Many Container Related Project started in LINE
Published
● https://www.slideshare.net/linecorp/parallel-selenium-test-with-docker
● https://www.slideshare.net/linecorp/test-in-dockerized-system-architecture-of-line-now-line-now-docker
● https://www.slideshare.net/linecorp/local-development-environment-for-micro-services-with-docker
● https://www.slideshare.net/linecorp/clova-92916456 (Japanese Only)
Undergoing Project
Verda Kubernetes as a Service → Background
6. Currently Application Engineer maintain it...
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
Private Cloud
Private Cloud Developers
Responsibility border
Application Developer
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
Verda Kubernetes as a Service → Background
IaaS
7. Is easy to run Kubernetes by
Application Developer?
1. Kubernetes is composed of multiple node/process
=> What if one of the node/process failed?
2. Prepare dynamically controllable network by software
=> Network knowledge is also required
3. Depending on distributed key-value store
=> What if network partition failure happened?
4. Cloud Service knowledge is required in real use case
=> How to provide persistent volume?
=> How to interact DNS/LB service for container?
5. Container Image need to be maintained for you
=> Who have this responsibility?
Verda Kubernetes as a Service → Background
8. No difference from complexity perspective
1. Kubernetes is composed of multiple process
2. Prepare controllable network by software
3. Depending on distributed key-value store
4. Cloud knowledge is required in real use case
5. Container Image need to be maintained for you
1. OpenStack is composed of multiple process
2. Neutron have similar responsibility
3. Not exact same but depending on MQ, RDBMS
4. Need to understand each cloud component
5. VM Image need to be maintained by Glance
Kubernetes OpenStack
Verda Kubernetes as a Service → Background
9. Private Cloud Developers
Operating Knowledge is distributed
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
knowledge knowledge
Verda Kubernetes as a Service → Background
Because of lack of mechanism
to share knowledge between
them, Quality will be uneven
IaaS
Private Cloud
Responsibility border
Application Developer
10. Time to extend our responsibility from IaaS to
Verda Kubernetes as a Service → Background
Private Cloud Developers
VM
Kubernetes Kubernetes
Container
Container
Container
Container
Container
Container
Container
Container
Developers A in Japan Developers B in Taiwan
OS
VM
OS
VM
OS
BM
OS
BM
OS
BM
OS
knowledge knowledge
knowledge
IaaS
KaaS
Private Cloud
Responsibility border
Application Developer
11. Verda Kubernetes as a Service? (KaaS)
Kubernetes
Kubernetes
Application
● Deploy Cluster
● Add Node
● Remove Node
● Monitoring Cluster
● Manage Addon
● Well configured Cluster
● Good affinity with our Cloud
Verda Kubernetes as a Service → Overview
12. What we want to achieve with KaaS
Phase2● Let application developers use well configured Kubernetes
○ Optimize Container Networking while aware underlay network implementation
○ Verified/Developed Kubernetes addon to enhance Kubernetes
Phase3● Make Kubernetes integrated into Verda/LINE development flow
○ Make it easy to use kubernetes for the person who is not familiar with Kubernetes
○ Build up CI/CD environment on top of the Kubernetes for users
○ Provide the way to express resource of Verda in Kubernetes manifest
Phase1● Reduce the burden to operate Kubernetes from developers
○ Support CRUD for Kubernetes on our Private Cloud
○ Support CRUD for Kubernetes Node
○ Notice when Cluster go something wrong by monitoring
○ Automatically healing for specific failure
○ Automatically scaling out Kubernetes cluster
Verda Kubernetes as a Service → Overview
13. Architecture of our Kubernetes as a Service
Request
Routing
K8s Orchestrator K8s Cluster for User
Core of our Service
Verda Kubernetes as a Service → Architecture
14. Request Routing Layer
K8s Orchestrator K8s Cluster for UserWhat?
● Developed from scratch and Written in golang
○ Is composed of “API Server” and “Database”
● Integrate orchestrating software for k8s into our Private Cloud
● Translate API call to backend orchestrating software for k8s
Why?
● Don’t strongly depend on specific OSS
○ Be able to replace Rancher with other without notifying user
● Don’t assume orchestrating software for k8s as a scalable
○ Actually Rancher doesn’t scale infinitely (probably others not)
○ Will need kind of sharding approach sooner or later
● Don’t mess up k8s orchestrator with our specific business logic
○ Develop customization outside of orchestrating software if that
customization is different direction from what upstream think
Verda Kubernetes as a Service → Architecture
Request Routing
15. K8s Orchestrator Layer
K8s Cluster for User
What?
● Orchestrating Software for K8s
○ Create/Remove/Update Cluster
○ Create/Remove/Update Node
○ Monitoring Cluster/Node
● Use Rancher with our patches
○ Is developed by k8s operator pattern
○ Is composed of API/Controller
■ API: Store desired state
■ Controller: Make state desired
○ We don’t use fancy features
Why Rancher?
● Less limitation for platform to use
○ OpenStack Magnum is not available for our platform
● Adopted k8s operator pattern from 2.0
○ Easy to focus on business logic
○ Retrying is considered by the pattern
Verda Kubernetes as a Service → Architecture
Request
Routing
K8s Orchestrator
16. Rancher? How we use it?
What is Rancher?
● OSS to manage Kubernetes developed by Rancher Lab
● Check followings for more detail architecture/implementation
○ https://github.com/ukinau/rancher-analyse
○ https://www.slideshare.net/linecorp/lets-unbox-rancher-20-v200
How we use?
● Rancher doesn’t have all features we needed
● We are not just user but also developer for Rancher
● We are using our own image for Rancher with our patches
○ Fixed some bugs
○ Added some feature which is under reviewing
=> “almost” all patches are proposed to upstream(11 patches)
Verda Kubernetes as a Service → Architecture
17. K8s Cluster for User
Request
Routing
K8s OrchestratorWhat?
● Pure Kubernetes is available
● Our Private Cloud specific addon is also available
● Kubernetes processes are maintained by agent
○ Cluster Agent => Maintain Cluster itself
○ Node Agent => Maintain Node
● Cluster Agent
○ Work as TCP Proxy for Rancher
○ Addon Manager (it’s not upstream)
■ Make sure enabled addon is running
● Node Agent
○ Work as TCP Proxy for Rancher
○ Check if Kubernetes processes need update
○ Check if File need to create
Verda Kubernetes as a Service → Architecture
K8s Cluster for User
18. Status of Kubernetes as a Service
● Release Phase1 16 November
● Finished to develop almost all features in first release
● Started to develop features in second release
○ Enhance monitoring for User Cluster
○ Support more addon on User Cluster
Verda Kubernetes as a Service → Status
19. Verda Event Handler
1. LINE’s Private Cloud Verda
2. Verda Kubernetes as a Service
3. Verda Event Handler
4. Summary
20. Look back usual daily operation of system
● When CPU usage exceeded 80% , create VM and run ansible
● When CPU usage got lower than 30%, delete VM
● When user is added to cloud, distribute his/her public key to all existing VM
● When user is deleted, remove his/her public key from all existing VM
● When picture is stored in object storage, generate small image of the picture
● When new docker image is registered in docker registry, update k8s manifest
● When LB is created, create DNS A record with LB Name and VIP address
● When resource is created in specific region, create same resource in another region
● When disk usage exceeded 90%, stop process and attach new disk and restart process
Verda Event Handler → Background
21. IaaS Private Cloud Developers
You might already automate some operation
Developers A in Japan
Cleanup ScriptA Do something
Operation ScriptA
Deploy ScriptA
Execute when VM add
Execute when User add
Execute when VM delete Cron job
CI/CD Tool
Test
Deploy
VM
OS
VM
OS
VM
OS
API Server
Autoscale
Do something by Webhook
Other Service
Verda Event Handler → Background
22. But visibility, operatability is considered?
● Visibility problem
○ All needed scripts/functions are distributed
■ Some of them are defined in CI/CD Tools
■ Some of them are defined in Crontab
■ Some of them are defined in specific API application for Webhook…
○ Limited person know all required script if information sharing is not enough
● Operatability/Maintenance problem
○ Costs of developing
○ Quality is completely up to developer for that script
○ Large operation script can be lack of ability to retry specific unit
○ Can be unexpectedly large script if try to make good script
○ The reason you needed to execute the script can not be confirmed later
Verda Event Handler → Background
23. ● When CPU usage exceeded 80% , create VM and run ansible
● When CPU usage got lower than 30%, delete VM
● When user is added to cloud, distribute his/her public key to all existing VM
● When user is deleted, remove his/her public key from all existing VM
● When picture is stored in object storage, generate small image of the picture
● When new docker image is registered in docker registry, update k8s manifest
● When LB is created, create DNS A record with LB Name and VIP address
● When resource is created in specific region, create same resource in another region
● When disk usage exceeded 90%, stop process and attach new disk and restart process
Let’s look back usual daily operations again
“When” and “What” is common in all operations
Verda Event Handler → Background
24. ‘When’ and ‘What’ are everything in operation
VM Created
User Created
Container Image Created
CPU usage exceeded 80%
Finish Operation ScriptA
Every 30 min
New PR is created
<⚡> Function (What)
Notify Slack
Deploy Application
Run Test
Prepare cloud resource
< 💥> Event (When)
Distribute public key to
all existing server
Verda Event Handler → Background
25. Verda Event Handler?
VM Created
User Created
Container Image Created
CPU usage exceeded 80%
Finish Operation ScriptA
Every 30 min
New PR is created
<⚡> Function (What)
Notify Slack
Deploy Application
Run Test
Prepare cloud resource
< 💥> Event (When)
Distribute public key to
all existing server
Function
Executor
Event
Provider
Event
Provider
Generate Event
Execute function
associated with event
Verda Event Handler → Overview
26. How Event Handler improved automation
● Visibility
○ All operation related function are visible to members in project
● Operatability/Maintenance problem
○ User can use exact same function for multiple purpose
■ Deploy script can be used in CI/CD, Autoscale, Autohealing….
○ User can retry for specific operation in large size operation
■ Large size operation is expressed as a group of different small functions
○ Easy to develop good script/function to automate
■ All user need to do is to write own business logic to automate
■ Function log is supported by Event Handler at default
■ API Interface is provided by Event Handler at default
○ The reason for each function execution are available for user to check
Verda Event Handler → Overview
27. Architecture is under planning
● Build up Event Handler on Kubernetes
● Utilizing Knative which is OSS to achieve FaaS
● Seems need to develop missing feature of Knative
● We are in still phase deciding architecture
Verda Event Handler → Architecture
28. Status of Event Handler
● Project is started Sep 2018
● Decided to utilize Knative
● Started to consider whole architecture
● Started to evaluate code-level analyzing for Knative
Verda Event Handler → Status
29. Summary
1. LINE’s Private Cloud Verda
2. Verda Kubernetes as a Service
3. Verda Event Handler
4. Summary
30. Our Private Cloud entered new generation
Cloud Resource Provider
Developer Platform
VM
Operation Programmable
Platform (Event Handler)
PaaS
Kubernetes as
a Service
DNS
Object
Storage
Managed
Database
・・・
Cloud Resource Provider
・・・
Changing gradually
Summary
31. Our Private Cloud entered new generation
Cloud Resource Provider
Developer Platform
VM
Operation Programmable
Platform (Event Handler)
PaaS
Kubernetes as
a Service
DNS
Object
Storage
Managed
Database
・・・
Cloud Resource Provider
・・・
Today’s Topic
Summary
32. We are hiring people!!
● Love read/customize OSS to meet our requirements
○ Kubernetes
○ Knative
○ Rancher
○ Etcd...
34. Event Handler Case1: Continuous Integration
New PR is created
Prepare cloud resource
<⚡> Function A
< 💥> Event
Run unit test
<⚡> Function E
Finish Function A
< 💥> Event
Run Integration Test
<⚡> Function C
Run deploy script
<⚡> Function B
Finish Function B
< 💥> Event
Finish Function B
< 💥> Event
Finish Function C
< 💥> Event
Feedback to Github
<⚡> Function D
Feedback to Github
<⚡> Function F
Appendix
35. Event Handler Case2: Continuous Deployment
New Change in master
< 💥> Event
Run Integration Test
<⚡> Function COmit…..
Finish Function C
< 💥> Event
New Image is registered
< 💥> Event
Update Kubernetes manifest to use new image
<⚡> Function G
Build docker image and push to registry
<⚡> Function D
Appendix
36. Event Handler Case3: ObjectStorage Integration
Object storage
New object is registered
< 💥> Event
Generate small image of object,
push it with small-$(original name) to object storage
<⚡> Function D
Object is changed to public
< 💥> Event
Check user-id and if user is not owner,
Change object back to private
<⚡> Function H
Processing object data which is in object Storage
User-level policy control of data in Object Storage
Appendix
37. Event Handler Case4: User Manage Integration
User deleted from the project
< 💥> Event
Remove his/her public key from all VMs in the project
<⚡> Function D
Add his/her public key to all VMs in the project
<⚡> Function A
User added to the project
< 💥> Event
Delete all VMs having his/her name as prefix for VM name
<⚡> Function E
Appendix
38. Event Handler Case5: Monitoring Integration
Monitoring Service
CPU Usage exceed 80%
Notify to Slack
<⚡> Function A
<⚡> Function B
Create VM and Deploy Application
Finish Function B
< 💥> Event
<⚡> Function C
Functional Test against created VM
Finish Function B
< 💥> Event
Let created VM belong to Loadbalancer
<⚡> Function D
We can use exact same function(code) used in CI
Appendix