vBridge are the creators of a multi-site IaaS platform, which provides clients with fast and reliable data storage and cost-effective computing services. Their cloud infrastructure monitoring solution aims to provide the simplicity, flexibility and control required by their clients. vBridge’s solution lets customers generate ad hoc performance graphs of their virtual workloads. Their API stores metrics on every request (http status code, response times, endpoint, etc). Discover how vBridge uses InfluxDB and Telegraf to collect and store backend metrics from Pure Storage and 3Par storage arrays.
In this webinar, Ben Young will dive into:
vBridge’s methodology to hosting infrastructure as a service
Their approach to delivering superior processing power, meeting uptime SLA’s and providing disaster recovery
How vBridge uses a time series database to empower their clients with real-time monitoring of clients’ backend systems
3. ABOUT VBRIDGE
● New Zealand based
● 11 Years Old
● IaaS / Cloud Services
● Multi-site (Christchurch / Auckland)
● Self-service portal
vbridge.co.nz
4. AGENDA
THE VBRIDGE WAY
01 Our approach to
delivering cloud services.
02 CUSTOMER FACING
Delivering real-time
metrics to customers.
03 INTERNALS
How we maintain a rock
solid platform.
04 QUESTIONS
Let’s open up the floor.
6. VBRIDGE VALUES
● Market leading support
● Empowerment via tooling
CUSTOMER FOCUS
● Market leader
● Measure everything
PERFORMANCE
● ISO 27001 Certified
● Regular audit and penetration
testing
SECURE
● High level of reinvestment
● Best of breed products
RELIABLE
8. SECRET SAUCE
● Multi-site cloud management portal
● Automates the vBridge cloud stack
● Enables customers to self service
● Award winning
9. WHYTIME SERIES
● Fit for purpose, optimised for millions of data points
across several systems
● Scales predictably in performance (read and write)
● Plays nice with other tooling (Grafana etc)
● Easy to integrate with from almost anything
WHY DID WE CHOOSE INFLUXDB?
● Was most mature in product in 2015
● Had the most mature community at the time
● OSS driven
WHY ARE WE STILL USING INFLUXDB
● Has never let us down
● Community and product continues to grow/mature at rapid rate
11. IAAS METRICS
What can customers do?
● View virtual machine resource usage
○ CPU / Memory utilisation
○ Disk throughput/io
○ Network
● Generate for last 60 minutes by default
○ 6/12/24 hours & 2/7 days on demand
Problems solved
● Immediate access for customers to VM vitals,
on demand
● Ability to show granular, non-smoothed metrics
back further than VMware vSphere can provide
● Ability to stack multiple servers on single graph
● Easier to integrate with (InfluxDB/Grafana)
12. IAAS METRICS ARCHITECTURE
DC 1 DC 2
RELAY
REVERSE PROXY
(ARR)
REVERSE PROXY
(ARR)
FUTURES
DC 1 DC 2
RELAY
REVERSE PROXY
(ARR)
REVERSE PROXY
(ARR)
CURRENT
21. INTERNAL USAGE
Internal use cases
● Monitoring 3par performance and disk utilisation
● Capacity planning via vSphere metrics
● Multi-site Veeam cloud connect monitoring
● MyCloudSpace health and usage monitoring
22. 3PAR MONITORING
In a nutshell
● Script logs in via SSH to 3PAR
● Runs statpd and showsys commands, scrapes and sends data
to InfluxDB
● Visualise with Grafana
SSH
23. 3PAR MONITORING
In a nutshell
● Script logs in via SSH to 3PAR
● Runs statpd and showsys commands, scrapes and sends data
to InfluxDB
● Visualise with Grafana
SSH
24. 3PAR MONITORING
In a nutshell
● Script logs in via SSH to 3PAR
● Runs statpd and showsys commands, scrapes and sends data
to InfluxDB
● Visualise with Grafana
SSH
25. 3PAR MONITORING
In a nutshell
● Script logs in via SSH to 3PAR
● Runs statpd and showsys commands, scrapes and sends data
to InfluxDB
● Visualise with Grafana
SSH
26. CAPACITY PLANNING
In a nutshell
● vSphere metrics collected by Telegraf
● Able to report long term on growth/performance allowing for
purchasing and capacity planning with ease
● Visualised with Grafana
27. CAPACITY PLANNING
In a nutshell
● vSphere metrics collected by Telegraf
● Able to report long term on growth/performance allowing for
purchasing and capacity planning with ease
● Visualised with Grafana
28. CAPACITY PLANNING
In a nutshell
● vSphere metrics collected by Telegraf
● Able to report long term on growth/performance allowing for
purchasing and capacity planning with ease
● Visualised with Grafana
29. CAPACITY PLANNING
In a nutshell
● vSphere metrics collected by Telegraf
● Able to report long term on growth/performance allowing for
purchasing and capacity planning with ease
● Visualised with Grafana
30. VEEAM CLOUD CONNECT
In a nutshell
● Query multiple Veeam environments with WMI
● Save data to InfluxDB
● Visualise with Grafana
● Helps with capacityplanning/performance monitoring
WMI
31. VEEAM CLOUD CONNECT
In a nutshell
● Query multiple Veeam environments with WMI
● Save data to InfluxDB
● Visualise with Grafana
● Helps with capacityplanning/performance monitoring
WMI
32. VEEAM CLOUD CONNECT
In a nutshell
● Query multiple Veeam environments with WMI
● Save data to InfluxDB
● Visualise with Grafana
● Helps with capacityplanning/performance monitoring
WMI
33. MYCLOUDSPACE HEALTH
In a nutshell
● ASP.NET API MessageHandler sends details of every request to
InfluxDB (endpoint, response time)
● MyCloudSpace API sends other data to InfluxDB (i.e login
success/failures)
● Visualise with Grafana
34. MYCLOUDSPACE HEALTH
In a nutshell
● ASP.NET API MessageHandler sends details of every request to
InfluxDB (endpoint, response time)
● MyCloudSpace API sends other data to InfluxDB (i.e login
success/failures)
● Visualise with Grafana
35. MYCLOUDSPACE HEALTH
In a nutshell
● ASP.NET API MessageHandler sends details of every request to
InfluxDB (endpoint, response time)
● MyCloudSpace API sends other data to InfluxDB (i.e login
success/failures)
● Visualise with Grafana
36. MYCLOUDSPACE HEALTH
In a nutshell
● ASP.NET API MessageHandler sends details of every request to
InfluxDB (endpoint, response time)
● MyCloudSpace API sends other data to InfluxDB (i.e login
success/failures)
● Visualise with Grafana
37. WHERE TO NEXT?
● Upgrade to InfluxDB 2.0
● POC background processing/alerting (per tenant)
● Internal alerting / advanced monitoring
● More data sources to build on existing capacity/performance
planning
● Machine learning POC