Contenu connexe Similaire à Nagios Conference 2013 - Eric Loyd - Dynamic AWS Server Usage Using Nagios Core (20) Nagios Conference 2013 - Eric Loyd - Dynamic AWS Server Usage Using Nagios Core1. Dynamic AWS Server Usage
Using Nagios Core
or
How to pay only for what you need
Eric Loyd
eric@bitnetix.com
877.33.VOICE
@Bitnetix @SmartVox
3. 3
About Eric Loyd and Bitnetix
Founder and CEO of Bitnetix Incorporated
VoIP services and IT/network consulting
Over 25 Years in IT and management at places like
Eastman Kodak
Frontier Communications / Global Crossing
Rochester Institute of Technology
Bitnetix started its eighth year in July, 2013
Digital Rochester GREAT Award Finalist in:
2012 for Communications Technology
2013 for Rising Star
Using Nagios since 2004
© 2013 Bitnetix Incorporated
5. 5
History of SmartVox, our VoIP Platform
Pre-2012 – not yet called SmartVox
Bitnetix primarily focused on IT consulting
VoIP service was ~10% of business with servers
located primarily at client sites
Custom Asterisk-based servers running FreePBX
We ran customer’s network so we had control over VoIP
2012 – Focus switched to VoIP
Focused now on hosted VoIP solutions
Made use of Amazon Web Services EC2 VPS
One per customer with no proxies* or media servers
Network/bandwidth was only customer responsiblity
© 2013 Bitnetix Incorporated
6. 6
History of SmartVox, our VoIP Platform
2013 – SmartVox name born
Copyright, trademark, domain name, biz cards, etc.
Third generation born with multiple
proxies, registrars, configuration servers, and media
servers
June – Started Mission Matrix program & sales
AWS architecture leveraged for geography
Each customer gets own EC2 server
Proxies to closest zone, secondary “to the west”
Media servers located in zones base on number of
simultaneous calls, conferences, etc.
VMs and CDRs stored in database
© 2013 Bitnetix Incorporated
8. 8
AWS EC2 Concepts
AWS – Amazon Web Services
Collection of cloud-based services:
Storage (S3), DNS (Route 53), CDN, Server (EC2)
EC2 - Elastic Compute Cloud
Virtual servers in AWS datacenters (zones)
US (3 = VA, CA, OR), EU (1), Asia (3), SA (1)
Persistent storage & flexible IP address assignment
Pay by the hour that it’s up, storage and bandwidth
Spot instances – “temporary” EC2 servers
Bring online as needed, terminated when shut down
© 2013 Bitnetix Incorporated
9. 9
AWS EC2 Costs
LOTS of variables, but reasonable potential costs:
Reserved servers cost about $2.00 per day
Reserved instance pricing is contractual and static, based on size
Spot servers cost between $0.50-$2.50 per day
Spot instance pricing is dynamic, we assume ~$0.10 per hour
We quantize concurrent calls into 50-call blocks
One media server = 50 calls = 1 spot instance
Two media servers = 100 calls = 2 spot instances
Bandwidth and storage will add ~10%
Reducing AWS usage reduces cost
We keep these savings for ourselves. Shhhh!!!
© 2013 Bitnetix Incorporated
11. 11
Why Nagios?
Extensive experience using it for clients
Bitnetix is a Nagios reseller
Needed centralized monitoring software
Integrate with Twitter for notifications
Integrate with Eventum via email for trouble tickets
Zero cost
Framework
Leverage SSH, HTTP, check_mk and livestatus!!
Custom checks and notifications (very important)
Ability to “cookie cutter” installs for AWS
© 2013 Bitnetix Incorporated
12. 12
Initial Hurdles
Customer Premise Equipment
No real control over CPE choices
Routers block some traffic, “help” other traffic incorrectly
Need to be able to remotely [re-]configure phones
Figure out how to “cookie-cutter” EC2 servers
Customer boxes and SIP endpoints
Proxies and media servers
Wanted to monitor upstream providers as well
How to separate apparent from actual failure
Something’s broken, but overall service functional
© 2013 Bitnetix Incorporated
14. 14
SmartVox Network
DNS SRV records are key to redundant servers
© 2013 Bitnetix Incorporated
Sends the call
on to the correct
phone/media
server (VM, etc)
Figures out what
customer should
receive the calls
Sends incoming
calls to
one/more border
proxies
Provider
Border
Proxy
Customer
Proxy
Customer
Proxy
Border
Proxy
Customer
Proxy
15. 15
Provisioning Process
SmartVox AWS EC2 Provisioning Database
Customer information
Account (location/division/etc) information
Number of phones*, VM boxes, etc.
Computes how many proxies customer needs
DNS SRV records created for batch updates
Media server/VM entries created automatically
Phone provisioning info created automatically
Automatically places order for phones* (+some)
Phones drop-shipped to customer in about 3 days
© 2013 Bitnetix Incorporated
16. 16
AWS EC2 Automation: Spot Instance API
Create spot instance -> gives request ID
Instance created with SmartVox created base image
Wait a bit -> query request ID -> get instance ID
Query instance -> get IP address
Update DNS with server information and IP
Update Nagios with server information and IP
When spot instances shut down, they terminate
No more expense for “burstable resources”
This sounds like a Nagios event handler…
© 2013 Bitnetix Incorporated
17. 17
AWS EC2 Automation: Our Custom Image
SmartVox media server image includes Asterisk
Asterisk told to exit after waiting for calls to terminate
Startup script shuts down system after Asterisk exits
Instant “spot instance”
Bring it online when needed, and terminate as required
Same basic idea for starting/stopping proxies
These tend to be more static than media servers
Platform can be adjusted automatically
COGS adjusts appropriately
Hey, let’s hook this up to Nagios!!
© 2013 Bitnetix Incorporated
18. 18
AWS EC2 Automation: More ideas
Quick aside about spot instances. Useful for:
Database dumps
Spot instance turned up to do MySQL copies
Run reports, dump, compress, purge, etc & term
Distributing web server load
Pop up another server and add to DNS
Instant on-demand capacity
Anything that you only want to do repeatedly
but not for a long time, and only when you
want to (or maybe if you have to)
© 2013 Bitnetix Incorporated
20. 20
Provisioning
Rather than create EC2s, we just update Nagios
Automatically regenerate SIP proxy and media server
dynamic_hosts.cfg file as part of provisioning process
Nagios looks for host up, doesn’t find it, fires off handler
Event handler queries EC2 to see if it’s being turned up (~10
min) or just not running. If it’s not running, it starts it.
DNS is batch updated every hour. 59 min TTLs
Phone provisioning handled via automatic extract from
database to create HTTP served configuration files
Master/slave “config servers” (also in AWS) to send all
this stuff to customers, with a URL to activate phones
Entire process from signature to functional < 1 week
© 2013 Bitnetix Incorporated
21. 21
Monitoring
Nagios looks for hosts (see previous slide)
Automatically creates them if needed
Note that SIP proxies are not spot instances
Dedicated to lifespan of customer/account so they are
only terminated as part of de-provisioning process
Nagios looks at health of services
Determine if we have faults, outages, etc.
Can potentially reroute automatically (DNS SRV!)
Store performance info for capacity calculations
Notifications via Twitter and email
Come back tomorrow at 10:30 for how this works
© 2013 Bitnetix Incorporated
22. 22
Capacity Planning
Quantize by 50 simultaneous calls per server
Perf data used to calculate historical usage
Can use cron to automatically add/remove servers
Nagios figures out “deltac” in current usage
If deltac = 0, we are just right (OK)
If deltac < 0, we have too much capacity (WARN)
If deltac > 0, we need more capacity (CRITICAL)
Event handler looks at state and either does
nothing, tells least used box to stop Asterisk, or adds
another box to the mix (see provisioning)
Capacity (and costs) dynamically adjust with usage
© 2013 Bitnetix Incorporated
23. 23
Capacity Planning: DeltaC
deltac – Custom Nagios module
Looks at the last three times it ran on particular host
Quantized by 50 calls = change in 50-call volumes
If deltac = 0 then we return an OK state
If deltac < 0 then we are dropping call volumes
and can SSH to a box and tell Asterisk to stop
This will then stop the spot instance and reduce cost
If deltac > 0 then we are gaining call volumes
and trigger provisioning process
This will start a spot instance and increase cost
© 2013 Bitnetix Incorporated
25. 25
How DeltaC Works
Let’s assume we’re creating a new host
ec2-request-spot-instances ami-58296831 -p 0.04 --key
"BTC EC2" --group Asterisk --instance-type m1.medium -n 1
--type one-time
Get back a “spotInstanceRequestId” (sir-722f4e34)
ec2-describe-spot-instance-requests sir-722f4e34
Get back an “instanceId” (i-6488e31f)
ec2-describe-instances i-6488e31f
Get back public IP address (ipAddress) of this machine
Now we have IP address and (internal) name
Populate DNS batch update queue
Regenerate /usr/local/nagios/etc/objects/dynamic_hosts.cfg
© 2013 Bitnetix Incorporated
26. 26
DeltaC Saves Lives Money
Small percentage changes in usage
result in large changes
in Cost Of Goods
For example:
© 2013 Bitnetix Incorporated
100 calls
• 2 boxes
• $0.20/hour
• ~$75/year
500 calls
• 10 boxes
• $1.00/hour
• ~$375/year
2000 calls
• 20 boxes
• $2.00/hour
• ~$750/year
5000 calls
• 50 boxes
• $5.00/hour
• ~$2000/year