Topics covered in this webinar which took place on July 24, 2015:
* Track blocked event loops and capture function calls causing all the trouble
* Trace 100% Node.js transactions 100% of the time
* Detect anomalies in the system and application behavior
* Go to a historical time slot and inspect calls and call stacks
* Flamegraphs and code breakdown of each Node.js function up to nano second range
Node.js Transaction Tracing & Root Cause Analysis with StrongLoop Arc
1. TRANSACTION TRACING & ROOT
CAUSE ANALYSIS WITH
STRONGLOOP ARC
Jordan Kasper | Developer Evangelist
2. STEP ONE
The first step in monitoring, profiling, and tracing your
Node application is to run it in a process manager!
3. BUILD YOUR APP WITH SLC
~$ npm install g strongloop
~/myapp$ slc build
...
~/myapp$ ls
... ... myapp0.1.0.tgz
4. INSTALL AND RUN STRONG PM
On your deployment machine...
~$ npm install g strongpm
~$ slpminstall
5. DEPLOY TO STRONG PM
From our development machine (or staging, etc)...
~/myapp$ slc deploy http+ssh://myserver.com:8701
6. RUNNING LOCALLY
If you need to profile things locally (your machine or a
staging/testing server), run slc start from your app directory:
~/myapp$ slc start
Process Manager is attempting to run app `.`.
To confirm it is started: slc ctl status tracingexampleapp
To view the last logs: slc ctl logdump tracingexampleapp
...
Then start the Arc UI:
~/myapp$ slc arc
11. WHAT DO I LOOK FOR?
CPU Usage is pretty obvious, just watach your high points!
With Heap Memory Usage you want to see a "sawtooth"
chart, each drop indicates garbage collection. No drop is
bad!
13. WHAT DO I LOOK FOR?
The two Event Loop metrics are opposed. You want the
loop count to remain high under normal load (more ticks
per metrics cycle is good). Any dips may be bad.
The Loop timing, on the other hand, indicates how long
event loop ticks are taking. Any spikes here are bad!
14. SETUP METRICS COLLECTION
On our production machine, with strong-pm installed,
simply set the collection location:
~$ export STRONGLOOP_METRICS="log:/path/to/apimetrics.log"
~$ export STRONGLOOP_METRICS="syslog"
~$ export STRONGLOOP_METRICS="statsd://mylogserver.com:1234"
~$ export STRONGLOOP_METRICS="graphite://mylogserver.com:1234"
~$ export STRONGLOOP_METRICS="splunk://mylogserver.com:1234"
15. SETUP METRICS COLLECTION
Alternatively, on the production machine you can run:
~$ slpminstall metrics <url>
Or during runtime:
~$ slc ctl envset myapp STRONGLOOP_METRICS=<url>
17. PROFILING
We can spot issues using the metrics being monitored, but
now we need to find the cause of those issues.
Profiling CPU usage and memory is the way to do this.
21. PROGRAMMATIC MEMORY MONITORING
If we have memory issues, it may be helpful to monitor
memory usage dynamically.
~$ npm install heapdump save
var heapdump = require('heapdump');
var THRESHOLD = 500;
setInterval(function () {
var memMB = process.memoryUsage().rss / 1048576;
if (memMB > THRESHOLD) {
process.chdir('/path/to/writeable/dir');
heapdump.writeSnapshot();
}
}, 60000 * 5);
22. MEMORY MONITORING
Caution: Taking a heap snapshot is not trivial on
resources.
If you already have a memory problem, this could kill your
process!
Unfortunately sometimes you have no alternative.
23. SMART PROFILING
How can we using the monitoring to profile?
"smart profiling" based on event loop blockage
~$ slc ctl cpustart 1.1.49408 20
1. Monitors a specific worker (1.1.49408)
2. Event loop blocked for more than 20ms, start CPU profile
3. Stop profiling once event loop resumes
24. FINDING THE WORKER ID
~$ slc ctl status
Service ID: 1
Service Name: myapp
Environment variables:
No environment variables defined
Instances:
Version Agent version Cluster size
4.1.0 1.5.1 4
Processes:
ID PID WID Listening Ports Tracking objects? CPU profilin
1.1.49401 49401 0
1.1.49408 49408 1 0.0.0.0:3001
1.1.49409 49409 2 0.0.0.0:3001
1.1.49410 49410 3 0.0.0.0:3001
1.1.49411 49411 4 0.0.0.0:3001
28. ANOMOLY INSPECTION
See something off?
Click on that point in the resource usage chart.
(The orange triangles at the bottom identify anomolies
betond three-sigma deviations.)
32. FLAME CHARTS
The flame chart identifies each function in the call stack,
organized in color by module.
The size of the bar represents the total resource
consumption for that function and all of its function calls.
Clicking on a function shows that functions resource usage.
33. LOOKING FOR MORE?
Check out our blog post on Transaction Tracing and
identifying a DoS attack!
http://bit.ly/arc-tracing