A look into the problems users are facing running serverless applications in production, solutions, and digging into the Lambda blackbox.
Presented by Erica Windisch, CTO of IOpipe, Inc. IOpipe offers Application Performance Monitoring for Serverless apps. Eric is ex-Docker, ex-Cloudscaling, builder of clouds, and destroyer of monoliths.
Register for IOpipe at www.iopipe.com!
4. EVOLUTION CREATES CHALLENGES
➤ Fear, uncertainty, and doubt for new users:
➤ What problems will I run into with this new platform?
➤ What will I do when those problems happen?
➤ Will I know about those problems when they happen?
➤ Is it secure?
➤ What tools to use?
5.
6. SERVERLESS DEVELOPER PROFILES
➤ Frameworks: SLS, Zappa, Apex, DIY, others.
➤ Event sources: API Gateway, SNS, S3, Kinesis, others. (Alexa
and AWS IoT sources are relatively infrequent)
➤ Languages: Node, Python, Java, Go, C, Ruby.
➤ Regions: all the regions: us-east, us-west, etc. several moving
to new international regions (Sydney, etc.)
➤ Events: 0-100m+ events per day
➤ Stage: dev/test through production
7.
8. CLOUDWATCH
➤ Basic “super-outside” metrics:
➤ Errors
➤ Logs
➤ Invocations/time
➤ Duration
➤ Memory
➤ This is what Datadog, Sumologic, etc. ingest.
9. HARD PROBLEMS
➤ Cold-starts
➤ Especially painful for Java users.
➤ Relationship of metrics vs logs.
➤ Lack or difficulty of profiling &
tracing tools. When do GCs
happen?
➤ Retries - why/when & in relation
to event sources
➤ AWS account level limits (& when
to bump them up)
➤ Difficulty of managing
unsupported languages:
C, C++, Go, Ruby, etc.
➤ Debugging of & visibility into
distributed systems
➤ Are failures at event-source or
lambda function?
➤ Kinesis!!!
➤ Cross-invocation leaks
➤ Memory leaks
➤ File descriptor leaks
➤ Backend process visibility
➤ Thread/callback leaks.
➤ etc.
10. ➤ We install into your process, around your functions.
➤ Import a library, use a decorator (or low-level reporting API)
➤ Gets info via NodeJS process var, Python sys, etc.
➤ Timing information for wrapped function(s).
➤ Stacktrace reporting.
➤ Extra logging / events pushed by developers.
➤ & looks outside…
INSIDE THE PROCESS
14. OUTSIDE THE FUNCTION - INSIDE THE BLACK BOX
➤ Reuse of containers and VMs
➤ Cold-starts by VM, container, and app process.
➤ Tenancy of VMs (how many containers)
➤ Host VM processes(!!) & processes in other containers(!!!)
➤ Limited & very likely to go away…
probably per-tenent VMs anyway
➤ Spawned processes
15. SECURITY
➤ I founded the Docker Security Team…
➤ FYI - Lambda’s not Docker!
➤ Lambda’s not perfect! (Security never is!)
➤ Amazon did a good job.
➤ Re-inventing the wheel means repeating some
mistakes solved elsewhere…
➤ Still… AWS did a pretty good job.
➤ Don’t worry about it.
➤ Some questions can only be answered by AWS or
with more data! TBD!
16.
17. APP MANAGEMENT
➤ Actionable metrics from inside & outside the function.
➤ Ingest CloudTrail for context-aware intelligence.
➤ Where events originate, retries, etc.
➤ Alarms -> Lambda invocation
➤ triggers AWS services, PagerDuty, IFTTT, Zapier, etc.
➤ Real-time visibility. Daily, Weekly, Monthly reporting.