This is a presentation delivered by Alec Leckey (Intel) at the 2nd Data Centre Symposium held in conjunction with the National Conference on Cloud Computing and Commerce (http://2018.nc4.ie/) on April 10, 2018 in Dublin, Ireland.
Learn more about the RECAP project: https://recap-project.eu/
Install the Intel Landscaper: https://github.com/IntelLabsEurope/landscaper
2. Data Centre challenges.
• Decrease Operational Costs,
• Maintain Consistent Performance,
• Increase Scale,
• Innovate, and deliver more value.
TCO Performance
Over-provisioning Utilization
Energy Allocation
Management Availability
More Capacity More Complexity
Application Growth Every
2 Years
Data Volume Every 18
Months
Operational Costs
Every
8 Years
Reduction in
Compute Costs
Every 2 Years
2x
Increase in
Management &
Administration1
50% 2x 2x 8x
More Resources
1 – IDC Directions ‘14 - 2014
Source: Worldwide and Regional Public IT Cloud Services 2013-2017 Forecast. IDC (August 2013) http://www.idc.com/getdoc.jsp?containerId=242464
3. Why we need to understand infrastructure
3
• T-Nova* project demonstrates a 10X
performance improvement when an
Network Traffic Analyzer is landed onto a
machine that is SR-IOV enabled
• However it’s not feasible to manually
place workloads at scale.
• How can we automatically match
workloads to suitable infrastructure?
VNFC Performance - Bytes Per Second
Total Traffic Standard Deployment Enhanced Deployment
Matching workload types to hardware features can improve performance
* http://www.t-nova.eu/
4. Infrastructure Landscape
Goal: Support setup and run-time
orchestration for optimised service delivery by
defining and maintaining a layered landscape:
• Physical
• Virtual
• Service
Nodes in each layer are enriched by telemetry
5. Landscaper Overview
Graph representation of physical, virtual, service
layers of infrastructure landscape
• Landscape Nodes have a category:
• Compute, Storage, Network
• Landscape History
• edges have a ‘from time’ and ‘to time’
• Landscape State
• landscape nodes have state nodes
• Data gathered by collectors
• Data export via RESTful API
• json string - networkx
Xeon E5Xeon Phi AES-NI AtomSSD
NVM 10Gb
Virtual
Storage
Object store Video transcode Wordpress ERP
Virtual Network
Virtual
Machine
13. Enrichment Through Telemetry
13
Snap: a Lightweight modular programmable telemetry system
• Unified namespace, Configurable at run time, Dynamically derived metrics
• Integration of diverse data for analysis
• Calculation of generic node metrics across the stack (e.g. Utilization & Saturation)
Instrumentation Logs Capture Store
Transform &
Prepare
Access
15. Snap - telemetry
Process PublishCollect
$ go get github.com/intelsdi-x/snap
http://snap-telemetry.io/
Plugin Catalogue (github)
16. Adaptive telemetry – anomaly detection approach
16
Challenge: Sending all data all the time
• overflow the system with “redundant”
information.
Goal: reduce data transfer while
preserving essence
Approach & Findings:
• Pluggable anomaly detection algorithm
• Increased transmission rate around
outliers only
• Transmissions typically reduced by
>10x
Time elapse (seconds)
%ageutilizationofCPU
Machine 1
Machine 2
Machine 3
17. Contextual Information
17
• Automatic application of USE methodology
• Ranking & Cost functions
• Supports comparison of service
configurations & generation of deployment
template for specific workloads
Representation of SDI sub-graph including performance
18. Application to large scale systems
18
• Optimization of Initial placement
• Re-balancing actuations
• Troubleshooting
• Accounting
• Security
• Capacity planning
Using the landscape data it is possible to develop models for:
19.
20. Network Model for vCDN deployments
Technical challenges:
• Performance of virtualisation technologies,
especially virtualised storage.
• Orchestration of a multi-tenant vCDN service
and infrastructure.
• Optimisation of placement and scaling of
vCDN system.
• Monitoring and repair of the vCDN system.
• Detection and mitigation of impact of “noisy
neighbours”.
21. 1. Load to Capacity
Requirement Mapping
2. Load to Telemetry Mapping 3. Infrastructure Configuration
Optimization
Resource
A
Telemetry
for
Resource A
Infrastructure
BT Workload
Resource
B
Telemetry
for
Resource B
KPI 1
KPI
2
Cost
Resource
A
Telemetry
for
Resource A
Infrastructure
Workload
Resource
B
Telemetry
for
Resource B
KPI 1
KPI
2
Cost
Resource
A
Telemetry
for
Resource A
Infrastructure
Workload
Resource
B
Telemetry
for
Resource B
KPI 1
KPI
2
Cost
Optimization approaches
26. Success Criteria
Create a system to:
• model the performance of VNF’s prior to deployment
• learn the configuration of existing networks and predict the impact of topology,
application and infrastructure changes
• improve the placement decisions of Orchestration systems to improve infrastructure
utilization whilst guaranteeing performance and availability SLAs
• put in place remediation rules a priory to failures happening. Ensuring rapid
protection using the minimum of additional resources
• automate the remediation of unexpected/unpredicted failures in a timely fashion
(several minutes).