4. 50 cent for PR
Mistral - Kuryu - Murano -
Magnum - Barbican - Solum ..
5.
6. Vietnam OpenStack Community
◎ Official User Group @ Vietnam
◎ 7 meetups @ Hanoi, HCM
◎ Sponsored by VietStack, DTT,
VCCorp, Fujitsu and OPS Foundation
◎ FB/vietstack
◎ groups.openstack.
org/groups/vietnam-vietopenstack
◎ Co-op with other
Cloud/Virtualization/Container UG:
Cloud Computing VN, DockerHN..
8. Agenda ~ 1.5h
Towards the Cloud
[WHAT] Cloud services and deploy
models [10%]
[WHOSE] Cloud which we will
toward [10%]
[HOW] Cloud Architecture patterns
(for appliances and system) ~
User Story
VDI [80%]
➔ Approach
➔ Cost and Effective
➔ Architecture
10. “CLOUD COMPUTING
?
“First to mind when asked what ‘the cloud’ is, a majority
respond it’s either an actual cloud, the sky, or something
related to weather.” - Citrix Cloud Survey - 2012
12. Cloud Deploy Models
Public Cloud
◎ Access anywhere, anytime.
◎ Unit: VM/Instance; Bandwidth..
◎ Amazon EC2, RackSpace, VCCloud, Z.
Com, FPT Public Cloud, GApp, Heroku, M$
365
◎ Public Cloud vs {VPS, Web Builder,
WebApp (EyeOS)} ??
“I don’t need a hard disk in my computer if I can get to
the server faster… carrying around these non-connected
computers is byzantine by comparison.” - Steve Jobs
25. Connection Broker
Missions:
◎ Handle requests from clients.
◎ Manager all current working sessions:
◉ Session recovery (network interrupt..)
◉ Session control: CRUD → require Agent deployed
in VM.
◎ Integrated with Cloud via API/WS:
◉ Handle connection to Cloud, ensure HA
◉ Handle all working VM state: on/off, migrate
◉ Deliver virtual disk/storage on-demand to client.
◎ Remote Machine: Grant access for client to remote to
VM.
◎ Remote App: using remote apps likely in native
environment.
◎ Quota: ? VM per users, ? session per users.
◎ Scheduler
◎ Multi-tenancy support.
26. Client & Agent
Client:
◎ Native App deployed in Thin/Zero Client (linux, win),
Web App (via HTML5 supported)
◎ Show user’s resources: apps, VDI VMs
◎ Remote to user’s resources
Agent:
◎ Deployed in VDI VMs.
◎ Interconnect with CB for handling remote session.
27. Connection Broker (CB) Problem
1. Very high traffics:
◉ Between CB and Cloud service endpoint: for
command and query tasks.
◉ Between CB and VM (Agent): init session, grant
access and apply policy (for multi-tenant
purpose).
◉ Between CB and Client: update user’s resource
(VM state, session status), connection status.
◉ Between Client and VM (Agent): remote desktop,
remote apps → huge bandwidth consumer.
2. Data consistency:
◉ VM State: conflict between CB (scheduler,
manual), client, cloud endpoint.
◉ Session status: conflict between CB, VM and
client
→ Approach: applying some cloud design-patterns.
28. VDI Biggest Problem
1. IOPS
◉ Many users read/write from 01 storage system ?
2. Network bandwidth
◉ Depend on remote protocol (Spice, ICA, PCoIP,
RDP, VNC,..)
3. User Experience
◉ Login/Logout time
◉ Using virtual app
◉ Using remote environment
29. VDI Biggest Problem Solution
1. IOPS
◉ Deploy multi-tier and auto-tiering storage system
◉ Caching (in-memory,..)
2. Network bandwidth
◉ Using UDP
◉ Implement compression algorithm (LZMA)
◉ Security concern.
3. User Experience
◉ Applying cloud pattern: throttling, retry, external
configuration store, runtime reconfig, health
endpoint monitoring for optimizing connection
broker
30. VDI Flow
1. Clients send request (RD, RA) to CB
for working in VDI VM.
2. CB Session Manager send request to
Cloud endpoint for ensuring VM is
starting and performing correctly. If
yes, create session by sending
request to agent. If no, make new
request for deployed new VM.
3. CB send remote parameters (display,
channel enable…) to client.
4. Agent VM send session’s status to
CB (ready, fail, creating)
5. If session status is ready, CB
announce to Client.
6. Client grab session id, remote
parameter and start working with
VDI VM.
31. #1 Problem: Too many duplicate requests between CB services → waste of
resources.
◉ CB monitor cloud status (VDI VM, Cloud service..) → periodically send
health-check request to Cloud service.
◉ CB monitor session status → periodically send health-check request
to VM Agent.
◉ Cloud Service must deploy multiple VMs from the same images.
→ Monitoring Solution ??
32. #1 Solution:
1. Apply Event Sourcing pattern: to make CB become eventually
consistent and store historical data operations.
E.g: VM State change event, Session status change event, cloud
service status change event..
2. Applying Cache-aside pattern: caching all VDI VM state, Cloud
service status, session status.
or:
Applying Health Endpoint-Monitoring Pattern.
33. Event Sourcing
When: Viewing/restoring from historical record of data
operations and restrict data update conflicts.
What: Implement append-only event store for publishing and
replaying. Event are immutable and simple object.
How: (ITLC SA - CQRS)
34. Cache-aside
When: Deploying app/service in PaaS that do not support
caching.
What: Implement local app read/write through caching
mechanism
How:
1. Determine whether the item is
currently held in the cache.
2. If the item is not currently in
the cache, read the item from
the data store.
3. Store a copy of the item in
the cache.
36. Health Endpoint Monitoring
When: complex system deploying in distributed environment,
including external services/agents
What: implement health monitoring to ensure they are available
and performing correctly.
How:
37. Event Sourcing + Cache-aside vs Health
Endpoint Monitoring
Health Endpoint Monitor
◎ Amount of requests depend
on Monitoring solution
◎ Lower performance (passive
check)
◎ Data consistency
◎ Easier and flexible to
integrate with Throttling
pattern or Auto-Scaling.
ES+Cache
◎ Lower rate request to
Cloud API and Agent
◎ Higher performance
(active change)
◎ Eventually consistency
◎ Provide only current state
of data → for improving,
using CQRS pattern.
38. CQRS
When: Traditional CRUD model can not handle large query
(read/write), hardly scale and ensure data consistency.
What: Segregate operations that read data from operations that
update data by using separate interfaces. Integrated with ES as
write model.
How: (ITLC SA - CQRS)
39. Issues of Cache-aside
◎ Determine which data to cache and where to store all
caches sometimes is very hard.
◉ What if I want to “cache” all virtual app in virtual
machines to improve UX ? → Atlantis Computing
Tech.
◉ in-memory cache or nosql ? (reduce IOPS or
consume more memory)
40. #2 Problem: What if error occur in VDI Flow (6 steps) ?
◉ CB forward ready session to client but VM state is
corrupt ?
◉ Client deploy/restart/shutdown VDI VM but Cloud
service is not available.
◉ Session is initializing but VDI VM OS have
BSOD/Kernel Panic.
→ Data inconsistency.
41. #2 Solution:
1. Apply Retry pattern: fault tolerance mechanism that
repeat tasks which expect to be success.
2. Applying Circuit-Breaker pattern: fault tolerance
mechanism that prevent system repeat task which is
likely to fail .
3. Applying Compensating Transaction pattern: reverse
data back to old state.
42. Retry
When: deploying services/apps that functions depend on
actions which expect to be success.
What: implement an mechanism handle failure actions.
How:
43. Circuit Breaker
When: prevent application/service from performing actions that
is likely to fail.
What: simulation circuit mechanism which have 3 state for
handling failure action.
How:
• Closed: route request to
services/apps; maintain
failure by a counter.
• Open: Requests from the
application fails
immediately; return
exception.
• Half-Open: A limited
number of requests are
allowed to pass through and
invoke the operation.
Change to Closed state if
reach success counter.
44. Retry co-op Circuit Breaker issues
◎ Define which task is successful expectation or likely failure.
E.g: All tasks interact with Cloud services → likely failure; all
task interact with VM agent → successful expectation. (Scope of
interaction)
◎ Define the correct time-out for heavy task.
E.g: deploy VM task need longer time-out than start VM
task.
◎ Define correct threshold for retry (retry counter) and circuit
(success/failure counter)
45. Compensating Transaction
When: trace path/restore state of data in services/apps that
have many operations to data store.
What: using workflow model to define an operation as step, also
define counter operation for each step model. (Ref Mistral Cloud
workflow engine)
E.g:
◎ Create - Delete
◎ Plus - Minus
◎ Multiply - Divide
49. Runtime Reconfiguration
When: Minimize downtime of applications when updating
configurations. (ref plugin architecture)
What: implement configuration-change event handler, keep
configuration outside of deployed app.
How:
50. External Configuration Store
When: Sharing configurations between multiple
app/instances/services
What: Implement centralize configuration store, can be
integrated with service discovery, health endpoint monitoring
and retry pattern
How:
51. How to reconfigure system in runtime ?
◎ Using plugin architecture → require independent
plugin, hardly design.
◎ VDI CB using interpreter programming language: PHP,
python.
52. #4 Problem: CB Server/VDI VM overload resources.
◉ HW upgrade for CB server ?
◉ Increase VDI VM HW resources (require downtime -
restart VM) ?
53. #4 Solution:
1. Virtualize CB Server !!!
2. Apply Throttling pattern co-op with Auto-Scaling feature in
Cloud.
or:
Apply some design pattern for distributed processing requests
(messages) → reference
◉ Competing Consumer
◉ Priority Queue
◉ Leader Election
54. Throttling
When: avoid resource overload, optimize performance for higher
priority services/apps.
What: disable features/service that have lower priority,
integrated with health endpoint monitoring.
How:
56. Auto Scaling
Server Overload:
◎ Increase resources (CPU, RAM, Storage..) that system
load take responz → Vertical Scaling (1)
◎ Buy new server (system?) and share loads between them
→ Horizontal Scaling (2)
(1)/(2) + Automation → Auto Scaling.
Some product:
Amazon Cloud Watch + Auto Scale; EXA TrueCloud.
Hyper-V (Dynamic Memory), VMWare (Memory Overcommit)
Citrix NetScaler (Hardware)
58. Auto Scaling Monitoring
◎ Metrics (Counter): amount of which resource you want
to check in realtime. Used for measuring and
calculating based on the scaling policies (rules)
◎ Agentless: hypervisor based.
◉ E.g: libvirt API (KVM), XAPI RRD (Xen)
◉ Pros: Fast, security.
◉ Cons: The metrics are too simple (CPU, MEM,
Storage – FullVirt; Network RX/TX – ParaVirt)
◎ Agent:
◉ E.g: SNMP ..
◉ Pros: Flexible and easily to manage
◉ Cons: Slow, sometimes can break user’s policies.
59. Auto Scaling DS
Decision Support Machine: grab the output from monitoring, based on user’s
policies (rules) and calculate the most satisfied actions.
E.g about Rules:
• if CPU > 80% then scale-up CPU to 4 cores 3.7GHz
• if Memory < 30% then scale-in to <n-1> VMs
→ Why we need DS ?
Look at following mesh case:
• Input metrics: CPU, MEM, Concurrent Connections. (CCC)
• Rules:
If CPU > 80% then Scale-out plus 01 VMs and LB between them.
If Mem > 85% then Scale-out plus 01 VMs and LB between them.
If CCC > 1000 then Scale-up CPU to 4 core 3.5GHz.
If CPU < 20% then Scale-in 01 VMs.
If Mem < 25% then Scale-in 01 VMs.
• So:
What if 01 VM have 80% CPU load and 10% Mem ?
60. Auto Scaling DS
DS need a conflict resolver.
Approaches:
• Rule-conflict check before apply auto-scaling: NetScaler, IBM Cloud.
• Using some algorithm for decision support:
• Neural Network
• FuzzyLogic
• Neuro-Fuzzy(ANFIS)
62. #5 Problem: Deploy VDI solution for different departments
whose identity/authorization system is not the same.
◉ Migrate old identity data to VDI identity system and
abandon the old one ?
◉ Implement new module in VDI identity system to
interact with the old mechanism ?
◉ Implement some IdM solutions (SSO, OpenID, STS..)
for both VDI and old identity system ?
64. Federated Identity
When: Deploy app (multiple services) in multiple cloud (IaaS) or
based on multiple platform (SaaS).
What: Implement an authentication mechanism that use
federated identity. Separating user authentication from the
application code, and delegating authentication to a trusted
identity provider
How:
65. Federated Identity in VDI Env
1. Authenticate with OWN identity provider (e.g. AD/LDAP) and receive
issued token.
2. Forwards this token to the CB federation provider (e.g. KeyStone).
Get back token valid for the VDI init phase.
3. Federation provider transform on the claims in the token into VDI CB
authorize mechanism.
4. Client apply authorization rules of VDI remote access with new token
from federation provider.
67. Benefits
◎ Reduce cost (IN THE FUTURE)
◉ HW Maintain
◉ Troubleshoot problem (network, OS..)
◉ Human resources
◎ Centralize management (network, security, resources,
session)
◎ Cloud benefits (HA, HS..)
68. Cost
The initial cost is often VERY HIGH
(based on system design, application design and how big is
your organizer )
The term “Cost Saving/Reduce cost” will appear in at least 1
year after deploying VDI
Which cost to reduce:
◎ HW maintain
◎ PC maintain
◎ Human resources (network, sysads)
◎ Time (troubleshooting time, maintain time..)
69. VDI Report
‘The state of the VDI and SBC union’ report, running from Feb 12 2015. About 519 participants
completed the full survey. Participants come from US, UK, The Netherlands, Germany and 20+ other
countries.
74. Ref
◎ Cloud Design Pattern - MS
◎ AWS Cloud Design Pattern [1]
◎ Pacific Asia Cloud report - 4th Meetup VietStack [2]
◎ VMWare Cloud index [3]
◎ Microservices vs Enterprise service bus by voxxed [4]
◎ Plugin Architect in Wikipedia
◎ IdM in Wikipedia
◎ ANFIS in Wikipedia
[1]: http://en.clouddesignpattern.org/
[2]: http://vietopenstack.org/2015/05/09/tong-quan-thi-truong-dien-toan-dam-may-tai-chau-a-thai-binh-duong-va-viet-nam/
[3]: http://info.vmware.com/content/APAC_APJ_Enterprise_Cloud_Index_2013
[4]: https://www.voxxed.com/blog/2015/01/good-microservices-architectures-death-enterprise-service-bus-part-one/
77. Priority Queue
When: services/apps have multiple kind of messages which have
time/resource consumer differential.
What: mark priority and elect suitable consumers for each
messages.
How:
78. Leader Election
When: multiple instances/services do the same task and make
data inconsistency
What: select one instance as leader and command other
instances/services.
How:
1: An instance request mutex
from BlobDistributedMutex
object and is elected the
leader.
2: Other instances request
mutex to run task and are
blocked.
3: The leader runs a task that
coordinates the work of the
subordinate instances.
4: The mutex in the leader
periodically renews the lease.