SlideShare a Scribd company logo
1 of 78
Download to read offline
Teach your application
eloquence.
Logs, metrics, traces.
Dmytro Shapovalov
Infrastructure Engineer @ Cossack Labs
Who we are?
• UK-based data security products and services
company

• Building security tools to prevent sensitive data
leakage and to comply with data security
regulations

• Cryptographic tools, security consulting, training

• We are cryptographers, system engineers,
applied engineers, infrastructure engineers

• We support community, speak, teach, open
source a lot
What we are going to talk
• Why do we need telemetry?
• What are the different kinds of telemetry?
• Borders of applicability of various types of
telemetry
• Approaches and mistakes
• Implementation
What is telemetry?
«Gathering data on the use of applications and
application components, measurements of start-up
time and processing time, hardware, application
crashes, and general usage statistics.»
Why do we need telemetry at all?
Who are the consumers?

− developers

− devops/sysadmins

− analysts

− security staff
What purposes?

− debug

− monitor state and health

− measure and tune performance

− business analysis

− intrusion detection
It is worthwhile, indeed
• speed up developing process
• increase overall stability
• reduce the reaction time on crashes and intrusions
• adequate business planning
It is worthwhile, indeed
• speed up developing process
• increase overall stability
• reduce the reaction time on crashes and intrusions
• adequate business planning
• COST of development
• COST of use
What data do we have to export?
… we can ask any specialist.
What data do we have to export?
… we can ask any specialist.
— ALL!… will be their answer.
Classification of information
technical:

− state

− health

− errors

− performance

− debug

− events
Classification of information
technical:

− state

− health

− errors

− performance

− debug

− events
business:

− SLI

− user actions
Classification of information
technical:

− state

− health

− errors

− performance

− debug

− events
business:

− SLI

− user actions
developers
devops/sysadmins
Classification of information
technical:

− state

− health

− errors

− performance

− debug

− events
business:

− SLI

− user actions
developers
devops/sysadmins
analysts
Classification of information
technical:

− state

− health

− errors

− performance

− debug

− events
business:

− SLI

− user actions
developers
devops/sysadmins
analysts
security staff
SIEM — security staff’s main instrument
Complex analyze:

− correlation

− threats

− patterns

− compliance
Applications
Network devices
Servers
Environment
Telemetry evolution
Logs
• each application has
an individual log file
• syslog:

− message standard
(RFC 3164, 2001)

− aggregation
• ELK (agents,
collectors)
• HTTP, JSON,
protobuf
Telemetry evolution
Logs
• each application has
an individual log file
• syslog:

− message standard
(RFC 3164, 2001)

− aggregation
• ELK (agents,
collectors)
• HTTP, JSON,
protobuf
Metrics
• reports into logs
• agents, collectors,
stores with
proprietary protocols
• SNMP
• HTTP, protobuf
• custom
implementations
Telemetry evolution
Logs
• each application has
an individual log file
• syslog:

− message standard
(RFC 3164, 2001)

− aggregation
• ELK (agents,
collectors)
• HTTP, JSON,
protobuf
Metrics
• reports into logs
• agents, collectors,
stores with
proprietary protocols
• SNMP
• HTTP, protobuf
• custom
implementations
Traces
• reports into logs
• agents, collectors,
stores with
proprietary protocols
• custom
implementations
Telemetry applicability
Logs
• simplest
• no external tools
required
• human readable
• UNIX-style
• compatible with a
tons of tools
• queries
• alerts
Telemetry applicability
Logs
• simplest
• no external tools
required
• human readable
• UNIX-style
• compatible with a
tons of tools
• queries
• alerts
Metrics
• minimal store size
• low performance
impact
• performance
measuring
• health and state
observing
• special structures
• queries
• alerts
Telemetry applicability
Logs
• simplest
• no external tools
required
• human readable
• UNIX-style
• compatible with a
tons of tools
• queries
• alerts
Metrics
• minimal store size
• low performance
impact
• performance
measuring
• health and state
observing
• special structures
• queries
• alerts
Traces
• minimal store size
• low performance
impact
• per-query metrics
• low-level
information
• precise debugging
and performance
tuning
Telemetry applicability
Logs
• simplest
• no external tools
required
• human readable
• UNIX-style
• compatible with a
tons of tools
• queries
• alerts
Metrics
• minimal store size
• low performance
impact
• performance
measuring
• health and state
observing
• special structures
• queries
• alerts
Traces
• minimal store size
• low performance
impact
• per-query metrics
• low-level
information
• precise debugging
and performance
tuning
+ SIEM systems
Telemetry flow
creation
Telemetry flow
creation
transport
aggregation
normalization
store
analyze + alerting
visualize
archive
Logs
Logs : kinds of data
• initial information about the application
• state changes (start/ready/…/stop)
• health changes
• audit trail (security-relevant list of activities: financial
operations, health care data transactions, changing keys,
changing configuration)
• user sessions (sign-in attempts, sign-out, actions)
• not expected actions (wrong URLs, sign-in fails, etc.)
• various information in string format
Logs : on start
• new state: starting
• application name
• component name
• commit hash / build number
• configuration in use
• deprecation warnings
• running mode
Logs : on ready
• new state: ready
• listen interfaces, ports and sockets
• health
Logs : on state or health change
• new state
• reason
• URL to documentation
Logs : on state or health change
• new state
• reason
• URL to documentation
Use traffic-light highlight system for health states:

● — completely unhealthy

● — partially healthy, reduced functionality

● — completely healthy
Logs : on shutdown
• reason
• status of preparing to shutdown
• new state: stopped (final goodbye)
Logs : each line
• timestamps (ISO8601, TZ, reasonable precission)
• PID
• application/component short name
• application version (JSON, CEF, protobuf)
• severity (CEF: 0→10, rfc5427: 7→0)
• event code (HTTP style)
• human-readable message
Logs : do not export!
• passwords, tokens, any sensitive data — security risks
• private data — legal risks
Use:
− masking
− anonymisation / pseudonymisation
Logs : consumers
• Console
• Files
• General purpose collector/store/alert/search
system.
• SIEM
Logs : consumers and formats
console,
STDERR
file syslog ELK SIEM
socket,
HTTP,
custom
plain ✓
syslog
(RFC3164) ✓ ✓ ✓ ✓ ✓ ✓
JSON ✓ ✓ ✓ ✓ ✓ ✓
CEF ✓ ✓ ✓ ✓ ✓ ✓
protobuf ✓ ✓
Logs : CEF
• old (2009), but widely used standard
• simple: easy to generate, easy to parse (supported
even by devices without powerful CPUs)
• well documented:

− field name dictionaries

− field types
CEF:Version|Device Vendor|Device Product|Device Version|
Signature ID|Name|Severity|Extension
Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100|
worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2
spt=1232
CEF naming, data formats
+
JSON/protobuf/… transport
=
painless logging
Logs : bear in mind [1/3]
• Logs will be read by humans. Often, when failure
happens. With limited time to reaction. Be brief and
eloquent. Give information that may help to solve a
problem.
• Logs will be searched. Don’t be a poet, be a technical
specialist. Use expected words.
• Logs will be parsed automatically; indeed, they will.
There are too many different systems that want telemetry
from your application.
• Carefully classify the severity of events. Many error
messages instead of warnings in non-critical situations
will lead to ignoring information from the logs.
Logs : bear in mind [2/3]
• Whenever it possible, base on existing standards.
Grouping event codes according to the HTTP
error code table is not bad idea.
• Logs are the first resource to analyze security
incidents.
• Logs will be archived and stored for a long period
of time. It will be almost impossible to cut off
some pieces of data.
• Should be configurable: formats, transport
protocols, paths, severity.
Logs : bear in mind [3/3]
• Your application may run in many different
environments with different standards of logging (VM,
docker). Application should be able to direct all logs
into one channel. Splitting may be an option.
• Do not implement log files rotation. Give possibility to
inform your application when it needs to gracefully
recreate the log file after being rotated by an external
service.
• When big trouble occurs and nothing works, your
application should be able to print readable logs in the
simplest manner — to stderr/stdout.
Logs : implementation
• native Ruby methods
• semantic_logger

https://github.com/rocketjob/semantic_logger

(a lot of destinations: DBs, HTTP, UDP, syslog)
• ougai

https://github.com/tilfin/ougai

(JSON)
• httplog

https://github.com/trusche/httplog

(HTTP logging, JSON support)
Metrics
Metrics : approaches
• USE method

Utilization, Saturation, Errors
• Google SRE book

Latency, Traffic, Errors, Saturation
• RED method

Rate, Errors, Duration
Metrics : utilization
• Hardware resources: CPU, disk system, network
intefaces
• File system: capacity, usage
• Memory: capacity, cache, heap, queue
• Resources: file descriptors, threads, sockets, connections
The average time that the resource was busy
servicing work.
Usage of resource.
Metrics : traffic, rate
• normal operations: 

− requests

− queries

− transactions

− sending network packets

− processing flow bytes
A measure of how much demand is being placed
on your system. (Google SRE book)
The number of requests, per second, you services
are serving. (RED Method)
Metrics : latency, duration
The time it takes to service a request. (Google SRE
book)
• latency of operations: 

− requests

− queries

− transactions

− sending network packets

− processing flow bytes
Metrics : errors
• error events:

− hardware errors

− software exceptions

− invalid requests / input

− authentication fails

− invalid URLs
The count of error events. (USE Method)
The rate of requests that fail, either explicitly,
implicitly, or by policy. (Google SRE book)
Metrics : saturation
• calculated value, measure of current load
The degree to which the resource has extra work
which it can't service, often queued. (USE Method)
How "full" your service is. A measure of your
system fraction, emphasizing the resources that are
most constrained. (Google SRE book)
Metrics : saturation
• can be calculated internally or measured
externally
• high utilization is a problem
• high saturation is a problem
• low utilization level does not guarantee that
everything is OK
• low saturation (in the case of a correct calculation)
most likely indicates that everything is OK
OpenMetrics : based on Prometheus metric types
• Gauge

single numerical value

− memory used

− fan speed

− connections count
• Counter

single monotonically increasing counter

− operations done

− errors occured

− requests processed
• Histogram

increment counter per buckets

− requests count per latency buckets

− CPU load values count per range buckets
• Summary

similar to the Histogram, but φ-quantiles are calculated on client-side;
calculating of other quantiles is not possible
https://openmetrics.io/
https://prometheus.io/docs/concepts/metric_types/
OpenMetrics : Average vs Percentile
Average
OpenMetrics : Average vs Percentile
Average
OpenMetrics : Average vs Percentile
Average
99 percentile
OpenMetrics : Average vs Percentile
Average
99 percentile
Metrics : buckets
<10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100
Metrics : buckets
<10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100
1
1
1
1
1
1 1
1
1
1
Metrics : buckets
<10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100
1
1
1
1
1
1 1
1
1
1
90 percentile50 percentile
Metrics : export data
• current state
• current health
• event counters:

− AAA events

− not expected actions (wrong URLs, sign-in fails)

− errors during normal operations
• performance metrics

− normal operations

− queues

− utilization, saturation

− query latency
• application info:

− version

− warnings/notifications gauge
Metrics : formats
• suggest using Prometheus format

− native for Prometheus

− OpenMetrics — open source specification

− simple and clear

− HTTP-based

− can be easily converted

− libraries exist
• Influx or similar format if you really need to implement
push model
• protobuf / gRPC

− custom

− high load

Metrics : implementation
• Prometheus Ruby client

https://github.com/prometheus/client_ruby
• native Ruby methods
Metrics : bear in mind [1/2]
• Split statistic by types. For example, the aggregation
of successful (relatively long) and failed (relatively
short) durations may lead to the illusion of
performance increase when multiple failures occur.
• Whenever it possible use Saturation to determine
load of system. Utilization is not complete
information.
• Be sure to export the metrics of the component
closest to the user. This will allow to evaluate the SLI.
• Implement configurable buckets sizes.
Metrics : bear in mind [2/2]
• Export appropriate metrics as buckets. It lower
polling rate and makes possible to get statistics
in percentiles.
• Add units to metric names.
• Whenever it possible, use SI units.
• Follow the naming standard. Prometheus
“Metric and label naming” document is a good
base.
Traces
Traces : definition
In software engineering, tracing involves a
specialized use of logging to record information
about a program's execution.
…
There is not always a clear distinction between
tracing and other forms of logging, except that the
term tracing is almost never applied to logging that is
a functional requirement of a program.
— Wikipedia
Traces : use cases
• Debugging during development
• Measuring and tuning performance
• Analyze failures and security incidents
https://www.cossacklabs.com/blog/how-
to-implement-distributed-tracing.html
• Approaches
• Library comparison
• Implementation example
• Use cases
Traces : principles
• Low overhead
• Application-level transparency
• Scalability
Traces : spans in trace tree
https://static.googleusercontent.com/media/research.google.com/uk/pubs/archive/36356.pdf
Traces : kinds of data
• trace id
• span id
• parent span id
• application info (product, component)
• module name
• method name
• context data (session/request id, user id, …)
• operation name and code
• start time
• end time
Per request/query tracking:
Traces : what it looks like
Traces : consumers
• General purpose collectors:

− Jaeger

− Zipkin
• Cloud collectors:

− Google StackDriver

− AWS X-Ray

− Azure Application Insights
• SIEM
Traces : formats
• Proprietary protocols:

− Jaeger

− Zipkin

− Google StackDriver

− AWS X-Ray

− Azure Application Insights
• JSON:

− SIEM
• protobuf/gRPC:

− custom
Traces : implementation
• OpenCensus

https://www.rubydoc.info/gems/opencensus

(Zipkin, GC Stackdriver, JSON)
• OpenTracing

https://opentracing.io/guides/ruby/
• Jaeger client

https://github.com/salemove/jaeger-client-ruby
Checklists
Checklist : Logs
□ Each line:

□ timestamps (ISO8601, TZ, reasonable precission)

□ PID

□ component name

□ severity

□ event code

□ human-readable message
□ Events to log:

□ state changes (start/ready/pause/stop)

□ health changes (new state, reason, doc URL)

□ user sign-in attempts (including failed with reasons), actions, sign-out

□ audit trail

□ errors
□ On start:

□ product name, component name

□ version (+build, +commit hash)

□ running mode (debug/normal, daemon/)

□ deprecation warnings

□ which configuration in use (ENV, file, configuration service)
□ On ready: communication sockets and ports
□ On exit: reason
□ Do not log:

□ passwords, tokens

□ personal data
Checklist : Metrics
□ Data to export:

□ application (version, warning/notification)

□ utilization (resources, capacities, usage)

□ saturation (internally calculated or appropriate metrics)

□ rate (operations)

□ errors

□ latencies
□ Split metrics by types
□ Export as buckets when reasonable
□ Configure size of buckets
□ Export metrics for SLI
□ Determine required resolution
□ Normalize, use SI units, add units to names
□ Prefer poll model if it possible
□ Clear counters on restart
Links [1/2]
• Dapper, a Large-Scale Distributed Systems Tracing Infrastructure

https://static.googleusercontent.com/media/
research.google.com/uk//pubs/archive/36356.pdf
• How to Implement Tracing in a Modern Distributed Application

https://www.cossacklabs.com/blog/how-to-implement-
distributed-tracing.html
• OpenTracing

https://opentracing.io/
• OpenMetrics

https://github.com/RichiH/OpenMetrics
• OpenCensus

https://opencensus.io
Links [2/2]
• CEF

https://kc.mcafee.com/resources/sites/MCAFEE/content/live/
CORP_KNOWLEDGEBASE/78000/KB78712/en_US/
CEF_White_Paper_20100722.pdf
• Metrics : USE method

http://www.brendangregg.com/usemethod.html
• Google SRE book

https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-
systems/
• Metrics : RED method

https://www.weave.works/blog/the-red-method-key-metrics-for-microservices-
architecture/
• MS Azure : monitoring and diagnostic

https://docs.microsoft.com/en-us/azure/architecture/best-practices/monitoring
• Prometheus : Metrics and label names

https://prometheus.io/docs/practices/naming/
Dmytro Shapovalov
Infrastructure Engineer @ Cossack Labs
Thank you!
shadinua
shad.in.ua
shad.in.ua

More Related Content

What's hot

CNIT 152: 4 Starting the Investigation & 5 Leads
CNIT 152: 4 Starting the Investigation & 5 LeadsCNIT 152: 4 Starting the Investigation & 5 Leads
CNIT 152: 4 Starting the Investigation & 5 LeadsSam Bowne
 
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)Sam Bowne
 
CNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X SystemsCNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X SystemsSam Bowne
 
CNIT 152: 13 Investigating Mac OS X Systems
CNIT 152: 13 Investigating Mac OS X SystemsCNIT 152: 13 Investigating Mac OS X Systems
CNIT 152: 13 Investigating Mac OS X SystemsSam Bowne
 
CNIT 121: 12 Investigating Windows Systems (Part 3)
CNIT 121: 12 Investigating Windows Systems (Part 3)CNIT 121: 12 Investigating Windows Systems (Part 3)
CNIT 121: 12 Investigating Windows Systems (Part 3)Sam Bowne
 
CNIT 152 12 Investigating Windows Systems (Part 2)
CNIT 152 12 Investigating Windows Systems (Part 2)CNIT 152 12 Investigating Windows Systems (Part 2)
CNIT 152 12 Investigating Windows Systems (Part 2)Sam Bowne
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence Sam Bowne
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network EvidenceCNIT 152: 9 Network Evidence
CNIT 152: 9 Network EvidenceSam Bowne
 
CNIT 125 Ch 4. Security Engineering (Part 1)
CNIT 125 Ch 4. Security Engineering (Part 1)CNIT 125 Ch 4. Security Engineering (Part 1)
CNIT 125 Ch 4. Security Engineering (Part 1)Sam Bowne
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsDamir Delija
 
CNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesCNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesSam Bowne
 
CNIT 125 Ch 8. Security Operations
CNIT 125 Ch 8. Security OperationsCNIT 125 Ch 8. Security Operations
CNIT 125 Ch 8. Security OperationsSam Bowne
 
CNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesCNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesSam Bowne
 
CISSP Prep: Ch 9. Software Development Security
CISSP Prep: Ch 9. Software Development SecurityCISSP Prep: Ch 9. Software Development Security
CISSP Prep: Ch 9. Software Development SecuritySam Bowne
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Damir Delija
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development SecuritySam Bowne
 
Laboratory Automation Systems- Scinomix
Laboratory Automation Systems- Scinomix Laboratory Automation Systems- Scinomix
Laboratory Automation Systems- Scinomix Scinomix
 
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...Sam Bowne
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence Sam Bowne
 

What's hot (20)

CNIT 152: 4 Starting the Investigation & 5 Leads
CNIT 152: 4 Starting the Investigation & 5 LeadsCNIT 152: 4 Starting the Investigation & 5 Leads
CNIT 152: 4 Starting the Investigation & 5 Leads
 
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)
CNIT 121: 12 Investigating Windows Systems (Part 1 of 3)
 
CNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X SystemsCNIT 121: 13 Investigating Mac OS X Systems
CNIT 121: 13 Investigating Mac OS X Systems
 
CNIT 152: 13 Investigating Mac OS X Systems
CNIT 152: 13 Investigating Mac OS X SystemsCNIT 152: 13 Investigating Mac OS X Systems
CNIT 152: 13 Investigating Mac OS X Systems
 
CNIT 121: 12 Investigating Windows Systems (Part 3)
CNIT 121: 12 Investigating Windows Systems (Part 3)CNIT 121: 12 Investigating Windows Systems (Part 3)
CNIT 121: 12 Investigating Windows Systems (Part 3)
 
CNIT 152 12 Investigating Windows Systems (Part 2)
CNIT 152 12 Investigating Windows Systems (Part 2)CNIT 152 12 Investigating Windows Systems (Part 2)
CNIT 152 12 Investigating Windows Systems (Part 2)
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network EvidenceCNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence
 
CNIT 125 Ch 4. Security Engineering (Part 1)
CNIT 125 Ch 4. Security Engineering (Part 1)CNIT 125 Ch 4. Security Engineering (Part 1)
CNIT 125 Ch 4. Security Engineering (Part 1)
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
CNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise ServicesCNIT 121: 10 Enterprise Services
CNIT 121: 10 Enterprise Services
 
CNIT 125 Ch 8. Security Operations
CNIT 125 Ch 8. Security OperationsCNIT 125 Ch 8. Security Operations
CNIT 125 Ch 8. Security Operations
 
CNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise ServicesCNIT 152: 10 Enterprise Services
CNIT 152: 10 Enterprise Services
 
CISSP Prep: Ch 9. Software Development Security
CISSP Prep: Ch 9. Software Development SecurityCISSP Prep: Ch 9. Software Development Security
CISSP Prep: Ch 9. Software Development Security
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
8. Software Development Security
8. Software Development Security8. Software Development Security
8. Software Development Security
 
Laboratory Automation Systems- Scinomix
Laboratory Automation Systems- Scinomix Laboratory Automation Systems- Scinomix
Laboratory Automation Systems- Scinomix
 
Timelines
TimelinesTimelines
Timelines
 
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
Practical Malware Analysis: Ch 0: Malware Analysis Primer & 1: Basic Static T...
 
CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence CNIT 152: 9 Network Evidence
CNIT 152: 9 Network Evidence
 

Similar to Teach Application Telemetry

I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendNicolas Carlier
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTWNGINX, Inc.
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception dataStackify
 
Enterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsEnterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsPrecisely
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)Olesya Shelestova
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Amazon Web Services
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16AppDynamics
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!Xavier Mertens
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1Lahav Savir
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!Richard Robinson
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrumentJonah Kowall
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environmentBIOVIA
 
Log management &amp; SIEM
Log management &amp; SIEMLog management &amp; SIEM
Log management &amp; SIEMBarakatAbweh
 
Information Security Lesson 4 - Baselines - Eric Vanderburg
Information Security Lesson 4 - Baselines - Eric VanderburgInformation Security Lesson 4 - Baselines - Eric Vanderburg
Information Security Lesson 4 - Baselines - Eric VanderburgEric Vanderburg
 
Security Practices - Logging.pptx
Security Practices - Logging.pptxSecurity Practices - Logging.pptx
Security Practices - Logging.pptxAlireza Vafi
 

Similar to Teach Application Telemetry (20)

I pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekendI pushed in production :). Have a nice weekend
I pushed in production :). Have a nice weekend
 
Application Security Logging with Splunk using Java
Application Security Logging with Splunk using JavaApplication Security Logging with Splunk using Java
Application Security Logging with Splunk using Java
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
OpenTelemetry 101 FTW
OpenTelemetry 101 FTWOpenTelemetry 101 FTW
OpenTelemetry 101 FTW
 
Cashing in on logging and exception data
Cashing in on logging and exception dataCashing in on logging and exception data
Cashing in on logging and exception data
 
Enterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected EnvironmentsEnterprise Security in Mainframe-Connected Environments
Enterprise Security in Mainframe-Connected Environments
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
RuSIEM overview (english version)
RuSIEM overview (english version)RuSIEM overview (english version)
RuSIEM overview (english version)
 
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
Using AWS to Build a Scalable Big Data Management & Processing Service (BDT40...
 
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
Monitoring and Instrumentation Strategies: Tips and Best Practices - AppSphere16
 
All your logs are belong to you!
All your logs are belong to you!All your logs are belong to you!
All your logs are belong to you!
 
All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!All Your Security Events Are Belong to ... You!
All Your Security Events Are Belong to ... You!
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1
 
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
SCRIMPS-STD: Test Automation Design Principles - and asking the right questions!
 
Next-gen Automation Framework
Next-gen Automation FrameworkNext-gen Automation Framework
Next-gen Automation Framework
 
The differing ways to monitor and instrument
The differing ways to monitor and instrumentThe differing ways to monitor and instrument
The differing ways to monitor and instrument
 
(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment(ATS6-PLAT07) Managing AEP in an enterprise environment
(ATS6-PLAT07) Managing AEP in an enterprise environment
 
Log management &amp; SIEM
Log management &amp; SIEMLog management &amp; SIEM
Log management &amp; SIEM
 
Information Security Lesson 4 - Baselines - Eric Vanderburg
Information Security Lesson 4 - Baselines - Eric VanderburgInformation Security Lesson 4 - Baselines - Eric Vanderburg
Information Security Lesson 4 - Baselines - Eric Vanderburg
 
Security Practices - Logging.pptx
Security Practices - Logging.pptxSecurity Practices - Logging.pptx
Security Practices - Logging.pptx
 

More from Ruby Meditation

Is this Legacy or Revenant Code? - Sergey Sergyenko | Ruby Meditation 30
Is this Legacy or Revenant Code? - Sergey Sergyenko  | Ruby Meditation 30Is this Legacy or Revenant Code? - Sergey Sergyenko  | Ruby Meditation 30
Is this Legacy or Revenant Code? - Sergey Sergyenko | Ruby Meditation 30Ruby Meditation
 
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...Ruby Meditation
 
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29Ruby Meditation
 
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...Ruby Meditation
 
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28 How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28 Ruby Meditation
 
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28Ruby Meditation
 
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...Ruby Meditation
 
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...Ruby Meditation
 
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...Ruby Meditation
 
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...Ruby Meditation
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27Ruby Meditation
 
New features in Rails 6 - Nihad Abbasov (RUS) | Ruby Meditation 26
New features in Rails 6 -  Nihad Abbasov (RUS) | Ruby Meditation 26New features in Rails 6 -  Nihad Abbasov (RUS) | Ruby Meditation 26
New features in Rails 6 - Nihad Abbasov (RUS) | Ruby Meditation 26Ruby Meditation
 
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26Ruby Meditation
 
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26Ruby Meditation
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Ruby Meditation
 
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...Ruby Meditation
 
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...Ruby Meditation
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRuby Meditation
 
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23Ruby Meditation
 
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...Ruby Meditation
 

More from Ruby Meditation (20)

Is this Legacy or Revenant Code? - Sergey Sergyenko | Ruby Meditation 30
Is this Legacy or Revenant Code? - Sergey Sergyenko  | Ruby Meditation 30Is this Legacy or Revenant Code? - Sergey Sergyenko  | Ruby Meditation 30
Is this Legacy or Revenant Code? - Sergey Sergyenko | Ruby Meditation 30
 
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...
Life with GraphQL API: good practices and unresolved issues - Roman Dubrovsky...
 
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29
Where is your license, dude? - Viacheslav Miroshnychenko | Ruby Meditation 29
 
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...
Dry-validation update. Dry-validation vs Dry-schema 1.0 - Aleksandra Stolyar ...
 
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28 How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28
How to cook Rabbit on Production - Bohdan Parshentsev | Ruby Meditation 28
 
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28
How to cook Rabbit on Production - Serhiy Nazarov | Ruby Meditation 28
 
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...
Reinventing the wheel - why do it and how to feel good about it - Julik Tarkh...
 
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...
Performance Optimization 101 for Ruby developers - Nihad Abbasov (ENG) | Ruby...
 
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...
Use cases for Serverless Technologies - Ruslan Tolstov (RUS) | Ruby Meditatio...
 
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...
The Trailblazer Ride from the If Jungle into a Civilised Railway Station - Or...
 
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
What/How to do with GraphQL? - Valentyn Ostakh (ENG) | Ruby Meditation 27
 
New features in Rails 6 - Nihad Abbasov (RUS) | Ruby Meditation 26
New features in Rails 6 -  Nihad Abbasov (RUS) | Ruby Meditation 26New features in Rails 6 -  Nihad Abbasov (RUS) | Ruby Meditation 26
New features in Rails 6 - Nihad Abbasov (RUS) | Ruby Meditation 26
 
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26
Security Scanning Overview - Tetiana Chupryna (RUS) | Ruby Meditation 26
 
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26
Best practices. Exploring - Ike Kurghinyan (RUS) | Ruby Meditation 26
 
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
Road to A/B testing - Alexey Vasiliev (ENG) | Ruby Meditation 25
 
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...
Concurrency in production. Real life example - Dmytro Herasymuk | Ruby Medita...
 
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...
Data encryption for Ruby web applications - Dmytro Shapovalov (RUS) | Ruby Me...
 
Rails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan GusievRails App performance at the limit - Bogdan Gusiev
Rails App performance at the limit - Bogdan Gusiev
 
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23
GDPR. Next Y2K in 2018? - Anton Tkachov | Ruby Meditation #23
 
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
Postgres vs Elasticsearch while enriching data - Vlad Somov | Ruby Meditaiton...
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Teach Application Telemetry

  • 1. Teach your application eloquence. Logs, metrics, traces. Dmytro Shapovalov Infrastructure Engineer @ Cossack Labs
  • 2. Who we are? • UK-based data security products and services company
 • Building security tools to prevent sensitive data leakage and to comply with data security regulations
 • Cryptographic tools, security consulting, training
 • We are cryptographers, system engineers, applied engineers, infrastructure engineers
 • We support community, speak, teach, open source a lot
  • 3. What we are going to talk • Why do we need telemetry? • What are the different kinds of telemetry? • Borders of applicability of various types of telemetry • Approaches and mistakes • Implementation
  • 4. What is telemetry? «Gathering data on the use of applications and application components, measurements of start-up time and processing time, hardware, application crashes, and general usage statistics.»
  • 5. Why do we need telemetry at all? Who are the consumers?
 − developers
 − devops/sysadmins
 − analysts
 − security staff What purposes?
 − debug
 − monitor state and health
 − measure and tune performance
 − business analysis
 − intrusion detection
  • 6. It is worthwhile, indeed • speed up developing process • increase overall stability • reduce the reaction time on crashes and intrusions • adequate business planning
  • 7. It is worthwhile, indeed • speed up developing process • increase overall stability • reduce the reaction time on crashes and intrusions • adequate business planning • COST of development • COST of use
  • 8. What data do we have to export? … we can ask any specialist.
  • 9. What data do we have to export? … we can ask any specialist. — ALL!… will be their answer.
  • 10. Classification of information technical:
 − state
 − health
 − errors
 − performance
 − debug
 − events
  • 11. Classification of information technical:
 − state
 − health
 − errors
 − performance
 − debug
 − events business:
 − SLI
 − user actions
  • 12. Classification of information technical:
 − state
 − health
 − errors
 − performance
 − debug
 − events business:
 − SLI
 − user actions developers devops/sysadmins
  • 13. Classification of information technical:
 − state
 − health
 − errors
 − performance
 − debug
 − events business:
 − SLI
 − user actions developers devops/sysadmins analysts
  • 14. Classification of information technical:
 − state
 − health
 − errors
 − performance
 − debug
 − events business:
 − SLI
 − user actions developers devops/sysadmins analysts security staff
  • 15. SIEM — security staff’s main instrument Complex analyze:
 − correlation
 − threats
 − patterns
 − compliance Applications Network devices Servers Environment
  • 16. Telemetry evolution Logs • each application has an individual log file • syslog:
 − message standard (RFC 3164, 2001)
 − aggregation • ELK (agents, collectors) • HTTP, JSON, protobuf
  • 17. Telemetry evolution Logs • each application has an individual log file • syslog:
 − message standard (RFC 3164, 2001)
 − aggregation • ELK (agents, collectors) • HTTP, JSON, protobuf Metrics • reports into logs • agents, collectors, stores with proprietary protocols • SNMP • HTTP, protobuf • custom implementations
  • 18. Telemetry evolution Logs • each application has an individual log file • syslog:
 − message standard (RFC 3164, 2001)
 − aggregation • ELK (agents, collectors) • HTTP, JSON, protobuf Metrics • reports into logs • agents, collectors, stores with proprietary protocols • SNMP • HTTP, protobuf • custom implementations Traces • reports into logs • agents, collectors, stores with proprietary protocols • custom implementations
  • 19. Telemetry applicability Logs • simplest • no external tools required • human readable • UNIX-style • compatible with a tons of tools • queries • alerts
  • 20. Telemetry applicability Logs • simplest • no external tools required • human readable • UNIX-style • compatible with a tons of tools • queries • alerts Metrics • minimal store size • low performance impact • performance measuring • health and state observing • special structures • queries • alerts
  • 21. Telemetry applicability Logs • simplest • no external tools required • human readable • UNIX-style • compatible with a tons of tools • queries • alerts Metrics • minimal store size • low performance impact • performance measuring • health and state observing • special structures • queries • alerts Traces • minimal store size • low performance impact • per-query metrics • low-level information • precise debugging and performance tuning
  • 22. Telemetry applicability Logs • simplest • no external tools required • human readable • UNIX-style • compatible with a tons of tools • queries • alerts Metrics • minimal store size • low performance impact • performance measuring • health and state observing • special structures • queries • alerts Traces • minimal store size • low performance impact • per-query metrics • low-level information • precise debugging and performance tuning + SIEM systems
  • 25. Logs
  • 26. Logs : kinds of data • initial information about the application • state changes (start/ready/…/stop) • health changes • audit trail (security-relevant list of activities: financial operations, health care data transactions, changing keys, changing configuration) • user sessions (sign-in attempts, sign-out, actions) • not expected actions (wrong URLs, sign-in fails, etc.) • various information in string format
  • 27. Logs : on start • new state: starting • application name • component name • commit hash / build number • configuration in use • deprecation warnings • running mode
  • 28. Logs : on ready • new state: ready • listen interfaces, ports and sockets • health
  • 29. Logs : on state or health change • new state • reason • URL to documentation
  • 30. Logs : on state or health change • new state • reason • URL to documentation Use traffic-light highlight system for health states:
 ● — completely unhealthy
 ● — partially healthy, reduced functionality
 ● — completely healthy
  • 31. Logs : on shutdown • reason • status of preparing to shutdown • new state: stopped (final goodbye)
  • 32. Logs : each line • timestamps (ISO8601, TZ, reasonable precission) • PID • application/component short name • application version (JSON, CEF, protobuf) • severity (CEF: 0→10, rfc5427: 7→0) • event code (HTTP style) • human-readable message
  • 33. Logs : do not export! • passwords, tokens, any sensitive data — security risks • private data — legal risks Use: − masking − anonymisation / pseudonymisation
  • 34. Logs : consumers • Console • Files • General purpose collector/store/alert/search system. • SIEM
  • 35. Logs : consumers and formats console, STDERR file syslog ELK SIEM socket, HTTP, custom plain ✓ syslog (RFC3164) ✓ ✓ ✓ ✓ ✓ ✓ JSON ✓ ✓ ✓ ✓ ✓ ✓ CEF ✓ ✓ ✓ ✓ ✓ ✓ protobuf ✓ ✓
  • 36. Logs : CEF • old (2009), but widely used standard • simple: easy to generate, easy to parse (supported even by devices without powerful CPUs) • well documented:
 − field name dictionaries
 − field types CEF:Version|Device Vendor|Device Product|Device Version| Signature ID|Name|Severity|Extension Sep 19 08:26:10 host CEF:0|security|threatmanager|1.0|100| worm successfully stopped|10|src=10.0.0.1 dst=2.1.2.2 spt=1232
  • 37. CEF naming, data formats + JSON/protobuf/… transport = painless logging
  • 38. Logs : bear in mind [1/3] • Logs will be read by humans. Often, when failure happens. With limited time to reaction. Be brief and eloquent. Give information that may help to solve a problem. • Logs will be searched. Don’t be a poet, be a technical specialist. Use expected words. • Logs will be parsed automatically; indeed, they will. There are too many different systems that want telemetry from your application. • Carefully classify the severity of events. Many error messages instead of warnings in non-critical situations will lead to ignoring information from the logs.
  • 39. Logs : bear in mind [2/3] • Whenever it possible, base on existing standards. Grouping event codes according to the HTTP error code table is not bad idea. • Logs are the first resource to analyze security incidents. • Logs will be archived and stored for a long period of time. It will be almost impossible to cut off some pieces of data. • Should be configurable: formats, transport protocols, paths, severity.
  • 40. Logs : bear in mind [3/3] • Your application may run in many different environments with different standards of logging (VM, docker). Application should be able to direct all logs into one channel. Splitting may be an option. • Do not implement log files rotation. Give possibility to inform your application when it needs to gracefully recreate the log file after being rotated by an external service. • When big trouble occurs and nothing works, your application should be able to print readable logs in the simplest manner — to stderr/stdout.
  • 41. Logs : implementation • native Ruby methods • semantic_logger
 https://github.com/rocketjob/semantic_logger
 (a lot of destinations: DBs, HTTP, UDP, syslog) • ougai
 https://github.com/tilfin/ougai
 (JSON) • httplog
 https://github.com/trusche/httplog
 (HTTP logging, JSON support)
  • 43. Metrics : approaches • USE method
 Utilization, Saturation, Errors • Google SRE book
 Latency, Traffic, Errors, Saturation • RED method
 Rate, Errors, Duration
  • 44. Metrics : utilization • Hardware resources: CPU, disk system, network intefaces • File system: capacity, usage • Memory: capacity, cache, heap, queue • Resources: file descriptors, threads, sockets, connections The average time that the resource was busy servicing work. Usage of resource.
  • 45. Metrics : traffic, rate • normal operations: 
 − requests
 − queries
 − transactions
 − sending network packets
 − processing flow bytes A measure of how much demand is being placed on your system. (Google SRE book) The number of requests, per second, you services are serving. (RED Method)
  • 46. Metrics : latency, duration The time it takes to service a request. (Google SRE book) • latency of operations: 
 − requests
 − queries
 − transactions
 − sending network packets
 − processing flow bytes
  • 47. Metrics : errors • error events:
 − hardware errors
 − software exceptions
 − invalid requests / input
 − authentication fails
 − invalid URLs The count of error events. (USE Method) The rate of requests that fail, either explicitly, implicitly, or by policy. (Google SRE book)
  • 48. Metrics : saturation • calculated value, measure of current load The degree to which the resource has extra work which it can't service, often queued. (USE Method) How "full" your service is. A measure of your system fraction, emphasizing the resources that are most constrained. (Google SRE book)
  • 49. Metrics : saturation • can be calculated internally or measured externally • high utilization is a problem • high saturation is a problem • low utilization level does not guarantee that everything is OK • low saturation (in the case of a correct calculation) most likely indicates that everything is OK
  • 50. OpenMetrics : based on Prometheus metric types • Gauge
 single numerical value
 − memory used
 − fan speed
 − connections count • Counter
 single monotonically increasing counter
 − operations done
 − errors occured
 − requests processed • Histogram
 increment counter per buckets
 − requests count per latency buckets
 − CPU load values count per range buckets • Summary
 similar to the Histogram, but φ-quantiles are calculated on client-side; calculating of other quantiles is not possible https://openmetrics.io/ https://prometheus.io/docs/concepts/metric_types/
  • 51. OpenMetrics : Average vs Percentile Average
  • 52. OpenMetrics : Average vs Percentile Average
  • 53. OpenMetrics : Average vs Percentile Average 99 percentile
  • 54. OpenMetrics : Average vs Percentile Average 99 percentile
  • 55. Metrics : buckets <10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100
  • 56. Metrics : buckets <10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100 1 1 1 1 1 1 1 1 1 1
  • 57. Metrics : buckets <10 < 20 < 30 < 40 < 50 < 60 < 70 < 80 < 90 < 100 1 1 1 1 1 1 1 1 1 1 90 percentile50 percentile
  • 58. Metrics : export data • current state • current health • event counters:
 − AAA events
 − not expected actions (wrong URLs, sign-in fails)
 − errors during normal operations • performance metrics
 − normal operations
 − queues
 − utilization, saturation
 − query latency • application info:
 − version
 − warnings/notifications gauge
  • 59. Metrics : formats • suggest using Prometheus format
 − native for Prometheus
 − OpenMetrics — open source specification
 − simple and clear
 − HTTP-based
 − can be easily converted
 − libraries exist • Influx or similar format if you really need to implement push model • protobuf / gRPC
 − custom
 − high load

  • 60. Metrics : implementation • Prometheus Ruby client
 https://github.com/prometheus/client_ruby • native Ruby methods
  • 61. Metrics : bear in mind [1/2] • Split statistic by types. For example, the aggregation of successful (relatively long) and failed (relatively short) durations may lead to the illusion of performance increase when multiple failures occur. • Whenever it possible use Saturation to determine load of system. Utilization is not complete information. • Be sure to export the metrics of the component closest to the user. This will allow to evaluate the SLI. • Implement configurable buckets sizes.
  • 62. Metrics : bear in mind [2/2] • Export appropriate metrics as buckets. It lower polling rate and makes possible to get statistics in percentiles. • Add units to metric names. • Whenever it possible, use SI units. • Follow the naming standard. Prometheus “Metric and label naming” document is a good base.
  • 64. Traces : definition In software engineering, tracing involves a specialized use of logging to record information about a program's execution. … There is not always a clear distinction between tracing and other forms of logging, except that the term tracing is almost never applied to logging that is a functional requirement of a program. — Wikipedia
  • 65. Traces : use cases • Debugging during development • Measuring and tuning performance • Analyze failures and security incidents https://www.cossacklabs.com/blog/how- to-implement-distributed-tracing.html • Approaches • Library comparison • Implementation example • Use cases
  • 66. Traces : principles • Low overhead • Application-level transparency • Scalability
  • 67. Traces : spans in trace tree https://static.googleusercontent.com/media/research.google.com/uk/pubs/archive/36356.pdf
  • 68. Traces : kinds of data • trace id • span id • parent span id • application info (product, component) • module name • method name • context data (session/request id, user id, …) • operation name and code • start time • end time Per request/query tracking:
  • 69. Traces : what it looks like
  • 70. Traces : consumers • General purpose collectors:
 − Jaeger
 − Zipkin • Cloud collectors:
 − Google StackDriver
 − AWS X-Ray
 − Azure Application Insights • SIEM
  • 71. Traces : formats • Proprietary protocols:
 − Jaeger
 − Zipkin
 − Google StackDriver
 − AWS X-Ray
 − Azure Application Insights • JSON:
 − SIEM • protobuf/gRPC:
 − custom
  • 72. Traces : implementation • OpenCensus
 https://www.rubydoc.info/gems/opencensus
 (Zipkin, GC Stackdriver, JSON) • OpenTracing
 https://opentracing.io/guides/ruby/ • Jaeger client
 https://github.com/salemove/jaeger-client-ruby
  • 74. Checklist : Logs □ Each line:
 □ timestamps (ISO8601, TZ, reasonable precission)
 □ PID
 □ component name
 □ severity
 □ event code
 □ human-readable message □ Events to log:
 □ state changes (start/ready/pause/stop)
 □ health changes (new state, reason, doc URL)
 □ user sign-in attempts (including failed with reasons), actions, sign-out
 □ audit trail
 □ errors □ On start:
 □ product name, component name
 □ version (+build, +commit hash)
 □ running mode (debug/normal, daemon/)
 □ deprecation warnings
 □ which configuration in use (ENV, file, configuration service) □ On ready: communication sockets and ports □ On exit: reason □ Do not log:
 □ passwords, tokens
 □ personal data
  • 75. Checklist : Metrics □ Data to export:
 □ application (version, warning/notification)
 □ utilization (resources, capacities, usage)
 □ saturation (internally calculated or appropriate metrics)
 □ rate (operations)
 □ errors
 □ latencies □ Split metrics by types □ Export as buckets when reasonable □ Configure size of buckets □ Export metrics for SLI □ Determine required resolution □ Normalize, use SI units, add units to names □ Prefer poll model if it possible □ Clear counters on restart
  • 76. Links [1/2] • Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
 https://static.googleusercontent.com/media/ research.google.com/uk//pubs/archive/36356.pdf • How to Implement Tracing in a Modern Distributed Application
 https://www.cossacklabs.com/blog/how-to-implement- distributed-tracing.html • OpenTracing
 https://opentracing.io/ • OpenMetrics
 https://github.com/RichiH/OpenMetrics • OpenCensus
 https://opencensus.io
  • 77. Links [2/2] • CEF
 https://kc.mcafee.com/resources/sites/MCAFEE/content/live/ CORP_KNOWLEDGEBASE/78000/KB78712/en_US/ CEF_White_Paper_20100722.pdf • Metrics : USE method
 http://www.brendangregg.com/usemethod.html • Google SRE book
 https://landing.google.com/sre/sre-book/chapters/monitoring-distributed- systems/ • Metrics : RED method
 https://www.weave.works/blog/the-red-method-key-metrics-for-microservices- architecture/ • MS Azure : monitoring and diagnostic
 https://docs.microsoft.com/en-us/azure/architecture/best-practices/monitoring • Prometheus : Metrics and label names
 https://prometheus.io/docs/practices/naming/
  • 78. Dmytro Shapovalov Infrastructure Engineer @ Cossack Labs Thank you! shadinua shad.in.ua shad.in.ua