Slides from my presentation at Better Software 2015.
Code generates business value when it runs not when we write it.
We must know how our code perform at runtime, how it interacts with the infrastructure, how our customers are using it to give them the opportunity to give use more money.
Set SLAs, measure them, reach them and move forward to increase the business value of our software.
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
What the hell is your software doing at runtime?
1. { }
{ }
{ }
Firenze, November 17th 2015
Roberto “FRANK” Franchini
@robfrankie
Increase business value, measure it!
What the hell is your
software doing at runtime?
2. More than 15 years of experience, proud to be a
programmer
Member of OrientDB team, tech lead for the full-text,
spatial, JDBC and Docker images
Wrote software for NLP and opinion mining (@scale )
Played with servers, then bought a sysadmin
JUG-Torino co-lead
2
whoami(1)
5. Business value
Our code generates business value
when it runs, not when we write it.
We need to know what our code does when it
runs.
We can’t do this unless we measure it.
(Codahale)
5
6. SLA driven
Have an SLA for your service
Measure and report performance against the
SLA
(Ben Treynor, Google inc.)
6
20. The day after deployment
How to monitor our service status?
How to measure it?
How it behave?
How it interact with other parts of the system?
Multiply for each µ-service
20
21. Monitorability
Design sw to be monitorable
Expose metrics (JMX)
Expose status (REST api)
Send metrics to monitoring tools
21
22. We need application monitoring
“Application monitoring? WHAT?”
“Ok, let me explain
What the app is doing right now?
How is the app performing right now?
And then graph it!”
“Ok, I got it!”
“Let me see” 22
23. 5 minutes later
public class PoorManJavaMetrics {
int called;
long totalTime;
public void doThings() {
final long start = System.currentTimeMillis();
//heavy business logic
called++;
final long end = System.currentTimeMillis();
final long duration = end - start;
totalTime +=duration;
}
public void logStats() {
System.out.println("---stats---");
//Here be DRAGONS
}
}
23
25. Use the right tool
Use a library (e.g.: dropwizard metrics)
Count events, measure duration
Log metric values
Send application metrics
to the same backend of system metrics
25
26. Don’t forget naming!
A naming pattern
<namespace>.<instrumented section>
.<target (noun)>.<action (past tense verb)>
Such as
accounts.authentication.password.failed
Use prefix
prod, test, dev, local
prod.accounts.authentication.password.failed
26
27. Which metrics?
Rate of documents processed
Latency
Transactions per second (€€€€)
Total number of errors
Meantime user interaction
27
29. Code on systems
Don’t cross the streams
Enable code metrics means
sysadmins and devs in the same room
talking to each other
to improve business value
29
35. To do what?
Discover bottlenecks
post-mortem analysis
SLA monitoring
IO impact
Network traffic
Memory utilization
35
36. To do what?
Why is performing better on dev laptop?
Why on customer infrastructure it takes 24h
(our old test server takes 1h)?
Mechanical sympathy at large: the new service
is fucking up the I/O
36
37. Implement THE User Story
Given the application running
when the manager comes
then I want to show a big green number
37
45. 45 bare metal servers
Ngnix, Jetty, PostgreSQL
GlusterFS, Queues,
Redis, Jenkins (cron on steroids)
Infrastructure
45
46. Software
Java shop
deploy with Docker
More than 120 webapps
More than 100 batch jobs
NRT stream processing jobs running 24x7
46
47. Monitoring
collectD, graphite, grafana for system
monitoring
Dropwizard Metrics inside code for application
monitoring
Application metrics reported to graphite too
47
48. Feedback and decisions
WTF happened last night?
How is it going this morning?
Do you think we can survive the message
flood?
Hey boss, it’s time to buy a new server, we are
running out of resources.
48
50. Shopping list
Define your SLAs/target
Code and deploy with good practices
Code with monitorability in mind
Monitor your app/service
Correlate system and application metrics
Get feedback
Take decisions 50