Presentation given to the #lspe meetup (Large Systems Performance Engineering) on February 21, 2013 by Steve Shah. Topic for the night was Dynamic Scaling. This presentation is titled "Shock Absorbers and APIs" and covers features typical of ADCs (modern load balancers) that can help in managing scale as well as give a quick overview of what to expect from an API in an ADC.
2. Disclaimer
• I’m going to talk about a product.
ᵒIt’s kind of necessary in order to make this talk useful.
ᵒBut a lot of you have this product or know someone that does!
ᵒThe product is pretty cool…
ᵒIt can also sing and dance.
ᵒMaking coffee is on the roadmap.
• Sorry.
ᵒYes, I am marketing scum.
ᵒNo, I will not to do a hard sell.
• My Competition
ᵒGoogle it. No really… It’s not hard to find them.
ᵒTheir product has various approaches too. I encourage you to ask them.
3. What is NetScaler?
Performan
Availability ce Offload Security
NetScaler powers some of the world’s largest infrastructures.
4. 1998 to 2012: From Load Balancing to Virtual
Networking
1998 1999 2002 2003 2005 2006 2008 2009 2011
L4 SLB L7 SLB SSL SSLVPN AppFW ICA XML VPX SDX
GSLB CMP RHI SIP IPv6 nCore EdgeSight AppFlow
MUX DNS AAA-TM DataStream
RHI = Route Health Injection
Secret Decoder Ring: ICA = App Proxy for ICA
SLB = Server Load Balancing IPv6 = IPv6 Routing, Switching, LB
GSLB = Global Server Load Balancing XML = XML Security, Routing
MUX = HTTP Multiplexing VPX = Virtual NetScaler
SSL = SSL Acceleration nCore = multi-core scaling
CMP = HTTP Compression SDX = Multi-tenant NetScaler
DNS = DNS Load Balancing / Proxy
5. Agenda
• Things That Impact Scalability
• Shock Absorbers
• Out Scaling
• Your ADC has an API!
7. Load is Not Linear
• There are startup costs for enabling features in an ADC (memory and CPU)
• However, each incremental request takes a small fraction of resources
• As load increases, some global functions can take resources as well
ᵒE.g., flushing unused IP fragments, running timers, management overhead, etc.
8. Data Structures and Big O
• I/O, Data structures, and String processing are big factors
• The two that get you are data structures and string
ᵒACLs, VLANs, connection table, connection state, persistence table, etc.
ᵒHTTP request processing and policy execution
• Know your Big O – understand their impact
ᵒBig O notation is how programmers describe efficiency of algorithms
ᵒE.g., O(n) vs. O(log n) vs. O(1)
10. Launching v8: The Role of Data Structures
• Story time… launching a major service and what we learned
• Major new roll-out – expected to double the number of servers to handle
• Early testing revealed that large numbers of slow connections are meh
• Invest in your data structures! Clean up on several core structures
• Average connection lookup time driven to near constant time: O(1)
• Stir in a team that dreams in assembly language and can see cache
misalignment by glancing at code and shave another 20% off connection
lookup times (absolute times)
• Lesson: drive your apps to good data structures. Drive your vendors to do
better.
11. MaxConns and SurgeQ
Incoming load
Peak perf – we want to
stay there
Typical server performance curve
12. MaxConns and SurgeQ
Queue incoming requests
in the ADC
Set max conns here
Server stays operating at maximum throughput
15. The SR-71 Approach: Go Faster
Treat a collection of NS devices
• Single System
like a grand unified “big” device
ᵒconfigured and managed as a
single logical system
• Scalable
The Sheet-metal Test
Steps:
ᵒscales with number of devices
• Take a cluster of NS, and an L2 switch. (distributes work)
• Configure the devices to your liking.
• Wrap the whole thing with sheet-metal, such
that only the network ports remain exposed.
• Fault Tolerant
Test: ᵒHandles device failure, addition…
Must be able to configure and use this contraption as
if it were just another NS box. • Dynamic
• connect wires into any visible port(s), create
LAGs at will, enable L2 mode, MBF …
• point GUI to Cluster’s IP and configure away
16. Clustering
• Create a single system image out of a collection of instances
ᵒInstances = virtual machines, physical instances, or instances on multi-tenant boxes
• True shared management + data plane (the sheet metal test)
• Shared state for key data structures (persistence, health check, etc.)
• Linear scale by adding instances (up to 32)
• Ability to manage faults with proportional degradation
17. Real-time Policy Based
Analytics Actions
Bandwidth Compress
Connections Cache
Top ‘N Requests Log
Response Time Drop
Frequency Respond
Policy Based Decision
Traffic Selection Feedback loop
18. Scaling Globally
Active Mirror
Site Site
Global Server Load Balancing Route Health Injection
(GSLB) (RHI)
NetScaler uses DNS to send users to the closest site based NetScaler dynamically updates routing tables to direct
on administrator defined metrics (geography, topology, clients to the active site based on real-time health
site performance, availability) monitoring of backend infrastructure.
20. API in a Nutshell: Your ADC Has This
API
Interfaces Client Toolkits Policy Statistics
Scripting OOP Reverse Bulk Granular
SOAP RESTful Perl/PHP/Python/ Java/C#/ASP/ JSON/XML
PowerShell .NET based Call-Out Reporting Reporting
21. More RESTful - HTTP Status Code
REQUEST RESPONSE
Success Case: Success Case
GET
http://<nsip>/nitro/v1/config/lbvserver/lbv1 HTTP 200 OK
Failure Case:
POST http://<nsip>/nitro/v1/config/lbvserver Failure Case:
Content-
Type:application/vnd.com.citrix.netscaler.lbvser HTTP/1.0 409 Conflict
ver+json
{
{"lbvserver": "errorcode": 273,
{"name":"lbv111", "servicetype":"HTTP"} "message": "Resource already exists",
} "severity": "ERROR"
}
Citrix Confidential - Do Not Distribute
22. Example: Using Java
Indicate we want “rollback on failure” in this session
Prepare 3 lbvservers to be added in one bulk operation
Output
Print results No attempt to add
“lb3” because of
Rollback behavior
23. AutoSense and AutoScale
NetScalerautomatically is auto-provisionedabnormal behavior withbindings
Traffic is monitoring engine auto-detects byin new serviceon NetScaler
NetScaler NetScaler scaled for the newly added services does servers
NetScaler triggers AutoScale capability CloudStack
CloudStack “auto-provisions”CloudStack provides CloudStackAutoScale policy
On successful AutoScale, adds server instances Latency, Throughput …
NetScaler automatically new new service resources and descriptions
monitors servers to CPU, Memory, based on
M
M
M
Internet M
M
M
CloudStack
NetScaler enhances the deliver of your customers web applications across four principle dimensions (click). These include Availability, Performance, Offload, and Security; all built on a common IT interface and providing an excellent ROI.Within each feature category there are numerous techniques (CLICK) delivered by NetScaler and I will elaborate on each.Customers gain:100% application availability via our world-class L4-L7 load balancing capabilities and intelligent service health monitoring featuresAccelerated application performance by 5x through static and dynamic content caching and compressionAn average of 60% in application infrastructure savings through connection pooling and offloading SSL processing from servers; this is especially important for Web 2.0 applicationsEnd-to-end application security with integrated Access Gateway Enterprise for secure remote access and an application firewall for protectionagainst application layer attacks