—We present a novel emulation system for creating
high-fidelity digital twins of IT infrastructures. The digital twins
replicate key functionality of the corresponding infrastructures
and allow to play out security scenarios in a safe environment.
We show that this capability can be used to automate the process
of finding effective security policies for a target infrastructure. In
our approach, a digital twin of the target infrastructure is used
to run security scenarios and collect data. The collected data is
then used to instantiate simulations of Markov decision processes
and learn effective policies through reinforcement learning, whose
performances are validated in the digital twin. This closed-loop
learning process executes iteratively and provides continuously
evolving and improving security policies. We apply our approach
to an intrusion response scenario. Our results show that the
digital twin provides the necessary evaluative feedback to learn
near-optimal intrusion response policies.
1. 1/19
Digital Twins for Security Automation
IEEE/IFIP Network Operations and Management Symposium
8-12 May 2023, Miami FL USA
Kim Hammar & Rolf Stadler
2. 2/19
Use Case: Intrusion Response
I A defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and active defense
I Has partial observability
I An attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
3. 3/19
Automated Intrusion Response: Current Landscape
Levels of security automation
No automation.
Manual detection.
Manual prevention.
No alerts.
No automatic responses.
Lack of tools.
1980s 1990s 2000s-Now Research
Operator assistance.
Manual detection.
Manual prevention.
Audit logs.
Security tools.
Partial automation.
System has automated functions
for detection/prevention
but requires manual
updating and configuration.
Intrusion detection systems.
Intrusion prevention systems.
High automation.
System automatically
updates itself.
Automated attack detection.
Automated attack mitigation.
4. 4/19
Can we use decision theory and learning-based methods to
automatically find effective security strategies?1
π
Σ Σ
security
objective
feedback
control
input
target
system
security
indicators
disturbance
1
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020,
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021, Kim Hammar and
Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on Network and Service
Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781, Kim Hammar and Rolf Stadler.
Learning Near-Optimal Intrusion Responses Against Dynamic Attackers. 2023. doi: 10.48550/ARXIV.2301.06085.
url: https://arxiv.org/abs/2301.06085.
5. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
6. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
7. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
8. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
9. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
10. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
11. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
12. 5/19
Our Framework for Automated Network Security
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
13. 6/19
Creating a Digital Twin of the Target Infrastructure
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
14. 6/19
Theoretical Analysis and Learning of Defender Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
15. 7/19
Creating a Digital Twin of the Target Infrastructure
I An infrastructure is
defined by its
configuration.
I Set of configurations
supported by our
framework can be
seen as a
configuration space
I The configuration
space defines the
class of
infrastructures for
which we can create
digital twins.
Configuration Space
*
* *
Digital twins
R1 R1 R1
16. 8/19
The Target Infrastructure
I 33 components
I Topology shown to the right
I Components run network services, e.g.
IDPS, SSH, Web, etc.
I A subset of components have
vulnerabilities
I CVE-2017-7494, CVE-2015-3306,
CVE-2015-5602
I CVE-2014-6271, CVE-2016-10033,
CVE-2015-1427, etc.
I Clients and the attacker access the
infrastructure through the public
gateway
Attacker Clients
. . .
Defender
1 IPS
1
alerts
Gateway
7 8 9 10 11
6
5
4
3
2
12
13 14 15 16
17
18
19
21
23
20
22
24
25 26
27 28 29 30 31
17. 9/19
Emulating Physical Components
I We emulate physical components with
Docker containers
I Focus on linux-based systems
I The containers include everything
needed to emulate the host: a runtime
system, code, system tools, system
libraries, and configurations.
I Examples of containers: IDPS
container, client container, attacker
container, CVE-2015-1427 container,
Open vSwitch containers, etc.
Containers
Physical server
Operating system
Docker engine
CSLE
18. 10/19
Emulating Network Connectivity
Management node 1
Emulated IT infrastructure
Management node 2
Emulated IT infrastructure
Management node n
Emulated IT infrastructure
VXLAN VXLAN . . . VXLAN
IP network
I We emulate network connectivity on the same host using
network namespaces.
I Connectivity across physical hosts is achieved using VXLAN
tunnels with Docker swarm.
19. 11/19
Emulating Network Conditions
I We do traffic shaping using
NetEm in the Linux kernel
I Emulate internal
connections are full-duplex
& loss-less with bit
capacities of 1000 Mbit/s
I Emulate external
connections are full-duplex
with bit capacities of 100
Mbit/s & 0.1% packet loss
in normal operation and
random bursts of 1% packet
loss
User space
. . .
Application processes
Kernel
TCP/UDP
IP/Ethernet/802.11
OS
TCP/IP
stack
Queueing
discipline
Device driver
queue (FIFO)
NIC
Netem config:
latency,
jitter, etc.
Sockets
20. 12/19
Emulating Actors
I We emulate client arrivals
with Poisson processes
I We emulate client
interactions with load
generators
I Attackers are emulated by
automated programs that
select actions from a
pre-defined set
I Defender actions are
emulated through a custom
gRPC API.
Markov Decision Process
s1,1 s1,2 s1,3 . . . s1,4
s2,1 s2,2 s2,3 . . . s2,4
Digital Twin
. . .
Virtual
network
Virtual
devices
Emulated
services
Emulated
actors
IT Infrastructure
Configuration
& change events
System traces
Verified security policy
Optimized security policy
21. 13/19
System Identification
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Digital Twin
Target
Infrastructure
Model Creation &
System Identification
Strategy Mapping
π
Selective
Replication
Strategy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Strategy evaluation &
Model estimation
Automation &
Self-learning systems
22. 14/19
Monitoring and Telemetry
Devices
Event bus
Security Policy
Storage Systems
Control actions
Data pipelines
Events
I Emulated devices run monitoring agents that periodically
push metrics to a Kafka event bus.
I The data in the event bus is consumed by data pipelines that
process the data and write to storage systems.
I The processed data is used by an automated security policy to
decide on control actions to execute in the digital twin.
23. 15/19
Estimating Metric Distributions
ˆ
f
O
(o
t
|0)
Probability distribution of # IPS alerts weighted by priority ot
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
ˆ
f
O
(o
t
|1)
Fitted model Distribution st = 0 Distribution st = 1
I We use the collected data to estimate metric distributions.
I We use the estimated distributions to instantiate Markov
games and Markov decision processes.
24. 16/19
Learning Security Strategies
I We model the evolution of the system with a discrete-time
dynamical system.
I We assume a Markovian system with stochastic dynamics and
partial observability.
I A Partially Observed Markov Decision Process (POMDP)
I If attacker is static.
I A Partially Observed Stochastic Game (POSG)
I If attacker is dynamic.
Stochastic
System
(Markov)
Noisy
Sensor
Optimal
filter
Controller
Attacker
action a
(1)
t
action a
(2)
t
observation
ot
state
st
belief
bt
25. 17/19
Learning Security Strategies
0
50
100
Reward per episode
0
50
100
150
Episode length (steps)
0.0
0.5
1.0
P[intrusion stopped]
T-SPSA simulation T-SPSA digital twin ot > 0 baseline Snort IDPS upper bound
I t-spsa is our reinforcement learning algorithm
I t-spsa outperforms Snort and converges to near-optimal
strategies
I While the performance is slightly better in simulation than in
the digital twin, it is clear that the performance in the two
environments are correlated.
26. 18/19
For more details about the theory
I Finding Effective Security Strategies through Reinforcement Learning and Self-Play2
I Learning Intrusion Prevention Policies through Optimal Stopping3
I A System for Interactive Examination of Learned Security Policies4
I Intrusion Prevention Through Optimal Stopping5
I Learning Security Strategies through Game Play and Optimal Stopping6
I An Online Framework for Adapting Security Policies in Dynamic IT Environments7
I Learning Near-Optimal Intrusion Responses Against Dynamic Attackers8
2
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020). Izmir, Turkey, 2020.
3
Kim Hammar and Rolf Stadler. “Learning Intrusion Prevention Policies through Optimal Stopping”. In:
International Conference on Network and Service Management (CNSM 2021).
http://dl.ifip.org/db/conf/cnsm/cnsm2021/1570732932.pdf. Izmir, Turkey, 2021.
4
Kim Hammar and Rolf Stadler. “A System for Interactive Examination of Learned Security Policies”. In:
NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium. 2022, pp. 1–3. doi:
10.1109/NOMS54207.2022.9789707.
5
Kim Hammar and Rolf Stadler. “Intrusion Prevention Through Optimal Stopping”. In: IEEE Transactions on
Network and Service Management 19.3 (2022), pp. 2333–2348. doi: 10.1109/TNSM.2022.3176781.
6
Kim Hammar and Rolf Stadler. “Learning Security Strategies through Game Play and Optimal Stopping”. In:
Proceedings of the ML4Cyber workshop, ICML 2022, Baltimore, USA, July 17-23, 2022. PMLR, 2022.
7
Kim Hammar and Rolf Stadler. “An Online Framework for Adapting Security Policies in Dynamic IT
Environments”. In: International Conference on Network and Service Management (CNSM 2022). Thessaloniki,
Greece, 2022.
8
Kim Hammar and Rolf Stadler. Learning Near-Optimal Intrusion Responses Against Dynamic Attackers. 2023.
doi: 10.48550/ARXIV.2301.06085. url: https://arxiv.org/abs/2301.06085.
27. 19/19
Conclusions
I We develop a framework for
automated security.
I Our framework centers
around a digital twin
I We use the digital twin to
optimize security strategies
through reinforcement
learning, game theory, and
control theory.
I Documentation of our
framework:
limmen.dev/csle.
Markov Decision Process
s1,1 s1,2 s1,3 . . . s1,4
s2,1 s2,2 s2,3 . . . s2,4
Digital Twin
. . .
Virtual
network
Virtual
devices
Emulated
services
Emulated
actors
IT Infrastructure
Configuration
& change events
System traces
Verified security policy
Optimized security policy