SlideShare une entreprise Scribd logo
1  sur  55
Télécharger pour lire hors ligne
1/30
Self-Learning Systems for Cyber Security
NSE Seminar
Kim Hammar & Rolf Stadler
kimham@kth.se & stadler@kth.se
Division of Network and Systems Engineering
KTH Royal Institute of Technology
April 9, 2021
2/30
3/30
4/30
Challenges: Evolving and Automated Attacks
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
Attacker Client 1 Client 2 Client 3
Defender
R1
4/30
Goal: Automation and Learning
I Challenges
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
Attacker Client 1 Client 2 Client 3
Defender
R1
4/30
Approach: Game Model & Reinforcement Learning
I Challenges:
I Evolving & automated attacks
I Complex infrastructures
I Our Goal:
I Automate security tasks
I Adapt to changing attack methods
I Our Approach:
I Model network attack and defense as
games.
I Use reinforcement learning to learn
policies.
I Incorporate learned policies in
self-learning systems.
Attacker Client 1 Client 2 Client 3
Defender
R1
5/30
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero1
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling2
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory3
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
1
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
2
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
3
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/30
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero4
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling5
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory6
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
4
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
5
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
6
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
5/30
State of the Art
I Game-Learning Programs:
I TD-Gammon, AlphaGo Zero7
, OpenAI Five etc.
I =⇒ Impressive empirical results of RL and self-play
I Attack Simulations:
I Automated threat modeling8
, automated intrusion detection
etc.
I =⇒ Need for automation and better security tooling
I Mathematical Modeling:
I Game theory9
I Markov decision theory
I =⇒ Many security operations involves
strategic decision making
7
David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017),
pp. 354–. url: http://dx.doi.org/10.1038/nature24270.
8
Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and
Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security.
ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi:
10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799.
9
Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA:
Cambridge University Press, 2010. isbn: 0521119324.
6/30
Our Work
I Use Case: Intrusion Prevention
I Our Method:
I Emulating computer infrastructures
I System identification and model creation
I Reinforcement learning and generalization
I Results:
I Learning to Capture The Flag
I Learning to Detect Network Intrusions
I Conclusions and Future Work
7/30
Use Case: Intrusion Prevention
I A Defender owns an infrastructure
I Consists of connected components
I Components run network services
I Defender defends the infrastructure
by monitoring and patching
I An Attacker seeks to intrude on the
infrastructure
I Has a partial view of the
infrastructure
I Wants to compromise specific
components
I Attacks by reconnaissance,
exploitation and pivoting
Attacker Client 1 Client 2 Client 3
Defender
R1
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
8/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation &
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning &
Generalization
Policy evaluation &
Model estimation
Automation &
Self-learning systems
9/30
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
9/30
Emulation System Σ Configuration Space
σi
*
* *
172.18.4.0/24
172.18.19.0/24
172.18.61.0/24
Emulated Infrastructures
R1 R1 R1
Emulation
A cluster of machines that runs a virtualized infrastructure
which replicates important functionality of target systems.
I The set of virtualized configurations define a
configuration space Σ = hA, O, S, U, T , Vi.
I A specific emulation is based on a configuration σi ∈ Σ.
10/30
Emulation: Execution Times of Replicated Operations
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Normalized
Frequency
Action execution times (costs)
|N| = 25
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 50
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 75
0 500 1000 1500 2000
Time Cost (s)
10−5
10−4
10−3
10−2
Action execution times (costs)
|N| = 100
I Fundamental issue: Computational methods for policy
learning typically require samples on the order of 100k − 10M.
I =⇒ Infeasible to optimize in the emulation system
11/30
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/30
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
11/30
From Emulation to Simulation: System Identification
R1
m1
m2 m3
m4
m5
m6
m7
m1,1 . . . m1,k
h i
m5,1 . . . m5,k
h i
m6,1 . . . m6,k
h i
m2,1
.
.
.
m2,k








m3,1
.
.
.
m3,k








m7,1
.
.
.
m7,k








m4,1
.
.
.
m4,k








Emulated Network Abstract Model POMDP Model
hS, A, P, R, γ, O, Zi
a1 a2 a3 . . .
s1 s2 s3 . . .
o1 o2 o3 . . .
I Abstract Model Based on Domain Knowledge: Models
the set of controls, the objective function, and the features of
the emulated network.
I Defines the static parts a POMDP model.
I Dynamics Model (P, Z) Identified using System
Identification: Algorithm based on random walks and
maximum-likelihood estimation.
M(b0
|b, a) ,
n(b, a, b0)
P
j0 n(s, a, j0)
12/30
System Identification: Estimated Dynamics Model
172.18.4.2
Connections Failed Logins Accounts Online Users Logins Processes
172.18.4.3
172.18.4.10
172.18.4.21
172.18.4.79
IDS
Alerts Alert Priorities Severe Alerts Warning Alerts
Estimated Emulation Dynamics
13/30
System Identification: Estimated Dynamics Model
−5 0 5 10 15 20
# New TCP/UDP Connections
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
New TCP/UDP Connections
0 10 20 30 40 50
# New Failed Login Attempts
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
New Failed Login Attempts
−30 −20 −10 0 10 20 30
# Created User Accounts
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
Created User Accounts
−0.04 −0.02 0.00 0.02 0.04
# New Logged in Users
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
New Logged in Users
−0.04 −0.02 0.00 0.02 0.04
# Login Events
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
New Login Events
0 20 40 60 80 100 120 140
# Created Processes
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
Created Processes
Node IP: 172.18.4.2
(b0, a0) (b1, a0) ...
14/30
System Identification: Estimated Dynamics Model
0 20 40 60 80 100 120
# IDS Alerts
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
IDS Alerts
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
# Severe IDS Alerts
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
Severe IDS Alerts
0 20 40 60 80 100
# Warning IDS Alerts
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
Warning IDS Alerts
0 50 100 150 200 250
# IDS Alert Priorities
0.0
0.2
0.4
0.6
0.8
1.0
P[·|(b
i
,
a
i
)]
IDS Alert Priorities
IDS Dynamics
(b0, a0) (b1, a0) ...
15/30
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
15/30
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
15/30
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
hPT
t=0 γt
rt+1
i
I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with
gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ
(o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N|·m·(m+1)
I Large action space |A| = |N| · (m + 1)
I Non-stationary Environment due to presence of
adversary
I Generalization
Agent
Environment
at
st+1
rt+1
15/30
Policy Optimization in the Simulation System
using Reinforcement Learning
I Goal:
I Approximate π∗
= arg maxπ E
PT
t=0
γt
rt+1

I Learning Algorithm:
I Represent π by πθ
I Define objective J(θ) = Eo∼ρπθ ,a∼πθ
[R]
I Maximize J(θ) by stochastic gradient ascent with gradient
∇θJ(θ) = Eo∼ρπθ ,a∼πθ
[∇θ log πθ(a|o)Aπθ (o, a)]
I Domain-Specific Challenges:
I Partial observability
I Large state space |S| = (w + 1)|N |·m·(m+1)
I Large action space |A| = |N | · (m + 1)
I Non-stationary Environment due to presence of adversary
I Generalization
I Finding Effective Security Strategies through
Reinforcement Learning and Self-Playa
a
Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and
Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir,
Turkey, Nov. 2020.
Agent
Environment
at
st+1
rt+1
16/30
Our Method for Finding Effective Security Strategies
s1,1 s1,2 s1,3 . . . s1,n
s2,1 s2,2 s2,3 . . . s2,n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Emulation System
Real world
Infrastructure
Model Creation 
System Identification
Policy Mapping
π
Selective
Replication
Policy
Implementation π
Simulation System
Reinforcement Learning 
Generalization
Policy evaluation 
Model estimation
Automation 
Self-learning systems
17/30
Learning Capture-the-Flag Strategies: Target Infrastructure
I Topology:
I 32 Servers, 1 IDS (Snort), 3 Clients
I Services
I 1 SNMP, 1 Cassandra, 2 Kafka, 8 HTTP, 1 DNS, 1 SMTP, 2 NTP, 5
IRC, 1 Teamspeak, 1 MongoDB, 1 Samba, 1 RethinkDB, 1
CockroachDB, 2 Postgres, 3 FTP, 15 SSH, 2 FTP
I Vulnerabilities
I 2 CVE-2010-0426, 2 CVE-2010-0426, 1 CVE-2015-3306, 1
CVE-2015-5602, 1 CVE-2016-10033, 1 CVE-2017-7494, 1
CVE-2014-6271
I 5 Brute-force vulnerabilities
I Operating Systems
I 14 Ubuntu-20, 9 Ubuntu-14, 1 Debian 9:2, 2 Debian Wheezy, 5
Debian Jessie, 1 Kali
I Traffic
I FTP, SSH, IRC, SNMP, HTTP, Telnet, IRC, Postgres, MongoDB,
Samba
I curl, ping, tracerotue, nmap..
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
18/30
Learning Capture-the-Flag Strategies: System Model 1/3
I A hacker (pentester) has T time-periods to
collect flags hidden in the infrastructure.
I The hacker is located at a dedicated starting
position N0 and can connect to a gateway
that exposes public-facing services in the
infrastructure.
I The hacker has a pre-defined set
(cardinality ∼ 200) of network/shell
commands available.
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
19/30
Learning Capture-the-Flag Strategies: System Model 2/3
I By execution of commands, the hacker
collects information
I Open ports, failed/successful exploits,
vulnerabilities, costs, OS, ...
I Sequences of commands can yield
shell-access to nodes
I Given shell access, the hacker can search
for flags
I Associated with each command is a cost c
(execution time) and noise n (IDS alerts).
I The objective is to capture all flags with the
minimal cost within the fixed time horizon T.
What strategy achieves this end?
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
19/30
Learning Capture-the-Flag Strategies: System Model 2/3
I By execution of commands, the hacker
collects information
I Open ports, failed/successful exploits,
vulnerabilities, costs, OS, ...
I Sequences of commands can yield
shell-access to nodes
I Given shell access, the hacker can search
for flags
I Associated with each command is a cost c
(execution time) and noise n (IDS alerts).
I The objective is to capture all flags with the
minimal cost within the fixed time horizon T.
What strategy achieves this end?
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
20/30
Learning Capture-the-Flag Strategies: System Model 3/3
I Contextual Stochastic CTF with Partial
Information
I Model infrastructure as a graph G = hN, Ei
I There are k flags at nodes C ⊆ N
I Ni ∈ N has a node state si of m attributes
I Network state
s = {sA, si | i ∈ N} ∈ R|N|×m+|N|
I Hacker observes oA
⊂ s
I Action space: A = {aA
1 , . . . , aA
k }, aA
i
(commands)
I ∀(b, a) ∈ A × S, there is a probability ~
w
A,(x)
i,j of
failure  a probability of detection
ϕ(det(si ) · n
A,(x)
i,j )
I State transitions s → s0
are decided by a
discrete dynamical system s0
= F(s, a)
I Exact dynamics (F, cA
, nA
, wA
, det(·), ϕ(·)),
are unknown to us!
N0, ~
S0
N1, ~
S1
N2, ~
S2
N3, ~
S3
N4, ~
S4
N5, ~
S5 N6, ~
S6
N7, ~
S7
~
cA
2,3, ~
nA
2,3, ~
wA
2,3
~
cA
3,2, ~
nA
3,2, ~
wA
3,2
~
cA
2,1, ~
nA
2,1, ~
wA
2,1
~
cA
1,2, ~
nA
1,2, ~
wA
1,2
~
c
A
0,1
,
~
n
A
0,1
,
~
w
A
0,1
~
cA
0,2, ~
nA
0,2, ~
wA
0,2
~
cA
3,6, ~
nA
3,6, ~
wA
3,6
~
cA
6,7, ~
nA
6,7, ~
wA
6,7
~
cA
7,4, ~
nA
7,4, ~
wA
7,4
~
c
A
4
,6
,
~
n
A
4
,6
,
~
w
A
4
,6
~
c
A
4
,5
,
~
n
A
4
,5
,
~
w
A
4
,5
~
c
A
5
,4
,
~
n
A
5
,4
,
~
w
A
5
,4
~
cA
1,5, ~
nA
1,5, ~
wA
1,5
~
c
A
5,2
,
~
n
A
5,2
,
~
w
A
5,2
~
c
A
2,
4
,
~
n
A
2,
4
,
~
w
A
2,
4
~
c A
1,4 , ~
n A
1,4 , ~
w A
1,4
Si ∈ Rm, ~
cA
i,j ∈ Rk
+
~
nA
i,j ∈ Rk
+, ~
wA
i,j ∈ [0, 1]k
~
cD ∈ Rl
+, ~
wD ∈ [0, 1]l
Graphical Model.
20/30
Learning Capture-the-Flag Strategies: System Model 3/3
I Contextual Stochastic CTF with Partial
Information
I Model infrastructure as a graph G = hN, Ei
I There are k flags at nodes C ⊆ N
I Ni ∈ N has a node state si of m attributes
I Network state
s = {sA, si | i ∈ N} ∈ R|N|×m+|N|
I Hacker observes oA
⊂ s
I Action space: A = {aA
1 , . . . , aA
k }, aA
i
(commands)
I ∀(b, a) ∈ A × S, there is a probability ~
w
A,(x)
i,j of
failure  a probability of detection
ϕ(det(si ) · n
A,(x)
i,j )
I State transitions s → s0
are decided by a
discrete dynamical system s0
= F(s, a)
I Exact dynamics (F, cA
, nA
, wA
, det(·), ϕ(·)),
are unknown to us!
N0, ~
S0
N1, ~
S1
N2, ~
S2
N3, ~
S3
N4, ~
S4
N5, ~
S5 N6, ~
S6
N7, ~
S7
~
cA
2,3, ~
nA
2,3, ~
wA
2,3
~
cA
3,2, ~
nA
3,2, ~
wA
3,2
~
cA
2,1, ~
nA
2,1, ~
wA
2,1
~
cA
1,2, ~
nA
1,2, ~
wA
1,2
~
c
A
0,1
,
~
n
A
0,1
,
~
w
A
0,1
~
cA
0,2, ~
nA
0,2, ~
wA
0,2
~
cA
3,6, ~
nA
3,6, ~
wA
3,6
~
cA
6,7, ~
nA
6,7, ~
wA
6,7
~
cA
7,4, ~
nA
7,4, ~
wA
7,4
~
c
A
4
,6
,
~
n
A
4
,6
,
~
w
A
4
,6
~
c
A
4
,5
,
~
n
A
4
,5
,
~
w
A
4
,5
~
c
A
5
,4
,
~
n
A
5
,4
,
~
w
A
5
,4
~
cA
1,5, ~
nA
1,5, ~
wA
1,5
~
c
A
5,2
,
~
n
A
5,2
,
~
w
A
5,2
~
c
A
2,
4
,
~
n
A
2,
4
,
~
w
A
2,
4
~
c A
1,4 , ~
n A
1,4 , ~
w A
1,4
Si ∈ Rm, ~
cA
i,j ∈ Rk
+
~
nA
i,j ∈ Rk
+, ~
wA
i,j ∈ [0, 1]k
~
cD ∈ Rl
+, ~
wD ∈ [0, 1]l
Graphical Model.
21/30
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
0.0
2.5
5.0
7.5
10.0
12.5
15.0
17.5
20.0
Avg
Episode
Regret
Episodic regret
Generated Simulation
Test Emulation Env
Train Emulation Env
lower bound π∗
Learning curves (simulation and emulation) of our
proposed method.
Attacker Client 1 Client 2 Client 3
Defender
R1
Evaluation infrastructure.
22/30
Learning Capture-the-Flag Strategies
0 50 100 150 200 250 300 350 400
# Iteration
−0.5
0.0
0.5
1.0
Episodic rewards
0 50 100 150 200 250 300 350 400
# Iteration
0
5
10
15
20
Episodic regret
0 50 100 150 200 250 300 350 400
# Iteration
4
6
8
10
12
14
Episodic steps
Configuration 1 Configuration 2 Configuration 3 π∗
R1
alerts
Gateway
172.18.4.0/24
R1
alerts
Gateway
172.18.3.0/24
R1
Gateway
alerts 172.18.2.0/24
Application
server
Intrusion
detection
system
R1
Gateway
Access
switch
Flag Traffic
generator
Configuration 1 Configuration 3
Configuration 2
23/30
Learning to Detect Network Intrusions: Target
Infrastructure
I Topology:
I 6 Servers, 1 IDS (Snort), 3 Clients
I Services
I 3 SSH, 2 HTTP, 1 DNS, 1 Telnet, 1 FTP, 1 MongoDB, 2
SMTP, 1 Tomcat, 1 Teamspeak3, 1 SNMP, 1 IRC, 1 Postgres,
1 NTP
I Vulnerabilities
I 1 CVE-2010-0426, 3 Brute-force vulnerabilities
I Operating Systems
I 4 Ubuntu-20, 1 Ubuntu-14, 1 Kali
I Traffic
I FTP, SSH, IRC, SNMP, HTTP, Telnet, IRC, Postgres,
MongoDB,
I curl, ping, tracerotue, nmap..
Attacker Client 1 Client 2 Client 3
Defender
R1
Evaluation infrastructure.
24/30
Learning to Detect Network Intrusions: System Model
(1/3)
I An admin should manage the
infrastructure for T time-periods.
I The admin can monitor the
infrastructure to get a belief about it’s
state bt
I b1, . . . , bT−1 can be assumed to be
generated from some unknown
distribution ϕ.
I If the admin suspects that the
infrastructure is being intruded based on
bt, he can suspend the suspicious
user/traffic.
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
24/30
Learning to Detect Network Intrusions: System Model
(1/3)
I An admin should manage the
infrastructure for T time-periods.
I The admin can monitor the
infrastructure to get a belief about it’s
state bt
I b1, . . . , bT−1 can be assumed to be
generated from some unknown
distribution ϕ.
I If the admin suspects that the
infrastructure is being intruded based on
bt, he can suspend the suspicious
user/traffic.
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
25/30
Learning to Detect Network Intrusions: System Model
(2/3)
I Suspending traffic from a true intrusion
yields a reward r (salary bonus)
I Not suspending traffic of a true
intrusion, incurs a cost c (admin is fired)
I Suspending traffic of a false intrusion,
incurs a cost of o (breaking the SLA)
I The objective is to to decide an optimal
response for suspending network traffic.
What strategy achieves this end?
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
25/30
Learning to Detect Network Intrusions: System Model
(2/3)
I Suspending traffic from a true intrusion
yields a reward r (salary bonus)
I Not suspending traffic of a true
intrusion, incurs a cost c (admin is fired)
I Suspending traffic of a false intrusion,
incurs a cost of o (breaking the SLA)
I The objective is to to decide an optimal
response for suspending network traffic.
What strategy achieves this end?
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
26/30
Learning to Detect Network Intrusions: System Model
(3/3)
I Optimal Stopping Problem
I Action space A = {STOP, CONTINUE}
I Belief state space B ∈ R8+10·m
I A belief state b ∈ B contains relevant
metrics to detect intrusions
I Alerts from IDS, Entries in
/var/log/auth, logged in users, TCP
connections, processes, ...
I Reward function R
I r(bt, STOP, st) = 1intrusion
β
ti
I β is a positive constant and ti is the
number of nodes compromised by the
attacker
I =⇒ incentive to detect intrusion early.
Attacker Client 1 Client 2 Client 3
Defender
R1
Target infrastructure.
27/30
Structural Properties of the Optimal Policy
I Assumptions: Always an intrusion before T, f (bt): probability of
intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t.
I Claim: Optimal policy is a threshold based policy
I Necessary condition for optimality (Bellman):
ut(bt) = sup
a

rt(bt, a) +
X
b0∈B
pt(b0
|bt, a)ut+1(b0
, a)
#
(1)
I Thus I have that it is optimal to stop at state bt iff
f (bt) ·
β
ti
≥
X
b0∈B
ϕ(b0
)ut+1(b0
) (2)
I Stopping threshold αt:
αt ,
ti
β
X
b0∈B
ϕ(b0
)ut+1(b0
) (3)
27/30
Structural Properties of the Optimal Policy
I Assumptions: Always an intrusion before T, f (bt): probability of
intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t.
I Claim: Optimal policy is a threshold based policy
I Necessary condition for optimality (Bellman):
ut(bt) = sup
a

rt(bt, a) +
X
b0∈B
pt(b0
|bt, a)ut+1(b0
, a)
#
(4)
= max

f (bt) ·
β
ti
,
X
b0∈B
ϕ(b0
)ut+1(b0
)
#
(5)
I Thus I have that it is optimal to stop at state bt iff
f (bt) ·
β
ti
≥
X
b0∈B
ϕ(b0
)ut+1(b0
) (6)
I Stopping threshold αt:
αt ,
ti
β
X
b0∈B
ϕ(b0
)ut+1(b0
) (7)
27/30
Structural Properties of the Optimal Policy
I Assumptions: Always an intrusion before T, f (bt): probability of
intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t.
I Claim: Optimal policy is a threshold based policy
I Necessary condition for optimality (Bellman):
ut(bt) = sup
a

rt(bt, a) +
X
b0∈B
pt(b0
|bt, a)ut+1(b0
, a)
#
(8)
= max

f (bt) ·
β
ti
,
X
b0∈B
ϕ(b0
)ut+1(b0
)
#
(9)
I Thus I have that it is optimal to stop at state bt iff
f (bt) ·
β
ti
≥
X
b0∈B
ϕ(b0
)ut+1(b0
) (10)
I Stopping threshold αt:
αt ,
ti
β
X
b0∈B
ϕ(b0
)ut+1(b0
) (11)
27/30
Structural Properties of the Optimal Policy
I Assumptions: Always an intrusion before T, f (bt): probability of
intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t.
I Claim: Optimal policy is a threshold based policy
I Necessary condition for optimality (Bellman):
ut(bt) = sup
a

rt(bt, a) +
X
b0∈B
pt(b0
|bt, a)ut+1(b0
, a)
#
(12)
= max

f (bt) ·
β
ti
,
X
b0∈B
ϕ(b0
)ut+1(b0
)
#
(13)
I Thus I have that it is optimal to stop at state bt iff
f (bt) ·
β
ti
≥
X
b0∈B
ϕ(b0
)ut+1(b0
) (14)
I Stopping threshold αt:
αt ,
ti
β
X
b0∈B
ϕ(b0
)ut+1(b0
) (15)
28/30
Learning to Detect Network Intrusions
0 25 50 75 100 125 150 175 200
# Iteration
−2
0
2
4
6
8
10
Avg
Episode
Rewards
Episodic rewards
πθ simulation
πθ emulation
Snort-Severe simulation
Snort-warning simulation
Snort-critical simulation
/var/log/auth simulation
Snort-Severe emulation
Snort-warning emulation
Snort-critical emulation
/var/log/auth emulation
upper bound π∗
Learning curves (simulation and emulation)
of our proposed method.
Attacker Client 1 Client 2 Client 3
Defender
R1
Evaluation infrastructure.
29/30
Learning to Detect Network Intrusions
0 25 50 75 100 125 150 175 200
# Iteration
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Fraction
TP (intrusion detected), FP (early stopping), FN (intrusion)
True Positive (TP)
False Positive (FP)
False Negative (FN)
max
Trade-off between detection and false positives
30/30
Conclusions  Future Work
I Conclusions:
I We develop a method to find effective strategies for intrusion
prevention
I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement
learning and (5) domain randomization and generalization.
I We show that self-learning can be successfully applied to
network infrastructures.
I Self-play reinforcement learning in Markov security game
I Key challenges: stable convergence, sample efficiency,
complexity of emulations, large state and action spaces
I Our research plans:
I Improving the system identification algorithm  generalization
I Evaluation on real world infrastructures

Contenu connexe

Tendances

Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeBipin Upadhyay
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentationRuchika Mehresh
 
On Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsOn Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsPluribus One
 
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...Pluribus One
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyCSCJournals
 
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...AM Publications
 
Secure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksSecure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksPluribus One
 
Computer security - A machine learning approach
Computer security - A machine learning approachComputer security - A machine learning approach
Computer security - A machine learning approachSandeep Sabnani
 
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...Pluribus One
 
Network Threat Characterization in Multiple Intrusion Perspectives using Data...
Network Threat Characterization in Multiple Intrusion Perspectives using Data...Network Threat Characterization in Multiple Intrusion Perspectives using Data...
Network Threat Characterization in Multiple Intrusion Perspectives using Data...IJNSA Journal
 
Protection Poker: An Agile Security Game
Protection Poker: An Agile Security GameProtection Poker: An Agile Security Game
Protection Poker: An Agile Security GameTechWell
 
2014 Fighting Game Artificial Intelligence Competition
2014 Fighting Game Artificial Intelligence Competition2014 Fighting Game Artificial Intelligence Competition
2014 Fighting Game Artificial Intelligence Competitionftgaic
 
2015 Fighting Game Artificial Intelligence Competition
2015 Fighting Game Artificial Intelligence Competition2015 Fighting Game Artificial Intelligence Competition
2015 Fighting Game Artificial Intelligence Competitionftgaic
 

Tendances (17)

O01021101112
O01021101112O01021101112
O01021101112
 
Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu Akindeinde
 
Proposal defense presentation
Proposal defense presentationProposal defense presentation
Proposal defense presentation
 
On Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial SettingsOn Security and Sparsity of Linear Classifiers for Adversarial Settings
On Security and Sparsity of Linear Classifiers for Adversarial Settings
 
Adversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdfAdversarial ML - Part 2.pdf
Adversarial ML - Part 2.pdf
 
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...
Battista Biggio, Invited Keynote @ AISec 2014 - On Learning and Recognition o...
 
Adversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdfAdversarial ML - Part 1.pdf
Adversarial ML - Part 1.pdf
 
Adversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A SurveyAdversarial Attacks and Defenses in Malware Classification: A Survey
Adversarial Attacks and Defenses in Malware Classification: A Survey
 
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...
Advanced Security Mechanism for Mobile Ad hoc Networks using Game Theoretic A...
 
Secure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion AttacksSecure Kernel Machines against Evasion Attacks
Secure Kernel Machines against Evasion Attacks
 
Computer security - A machine learning approach
Computer security - A machine learning approachComputer security - A machine learning approach
Computer security - A machine learning approach
 
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
Battista Biggio @ MCS 2015, June 29 - July 1, Guenzburg, Germany: "1.5-class ...
 
Network Threat Characterization in Multiple Intrusion Perspectives using Data...
Network Threat Characterization in Multiple Intrusion Perspectives using Data...Network Threat Characterization in Multiple Intrusion Perspectives using Data...
Network Threat Characterization in Multiple Intrusion Perspectives using Data...
 
Protection Poker: An Agile Security Game
Protection Poker: An Agile Security GameProtection Poker: An Agile Security Game
Protection Poker: An Agile Security Game
 
Security of Machine Learning
Security of Machine LearningSecurity of Machine Learning
Security of Machine Learning
 
2014 Fighting Game Artificial Intelligence Competition
2014 Fighting Game Artificial Intelligence Competition2014 Fighting Game Artificial Intelligence Competition
2014 Fighting Game Artificial Intelligence Competition
 
2015 Fighting Game Artificial Intelligence Competition
2015 Fighting Game Artificial Intelligence Competition2015 Fighting Game Artificial Intelligence Competition
2015 Fighting Game Artificial Intelligence Competition
 

Similaire à Self-learning systems for cyber security

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionKim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Kim Hammar
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security AutomationKim Hammar
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Kim Hammar
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingKim Hammar
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseKim Hammar
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion ResponseKim Hammar
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Kim Hammar
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingKim Hammar
 
carl-svensson-exjobb-merged
carl-svensson-exjobb-mergedcarl-svensson-exjobb-merged
carl-svensson-exjobb-mergedCalle Svensson
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayKim Hammar
 
Security in Machine Learning
Security in Machine LearningSecurity in Machine Learning
Security in Machine LearningFlavio Clesio
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONacijjournal
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONacijjournal
 
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
Vulnerabilities of Fingerprint Authentication Systems and Their SecuritiesVulnerabilities of Fingerprint Authentication Systems and Their Securities
Vulnerabilities of Fingerprint Authentication Systems and Their SecuritiesTanjarul Islam Mishu
 
[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defenceOWASP EEE
 

Similaire à Self-learning systems for cyber security (20)

CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Nse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadlerNse seminar 4_dec_hammar_stadler
Nse seminar 4_dec_hammar_stadler
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
carl-svensson-exjobb-merged
carl-svensson-exjobb-mergedcarl-svensson-exjobb-merged
carl-svensson-exjobb-merged
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Security in Machine Learning
Security in Machine LearningSecurity in Machine Learning
Security in Machine Learning
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATIONCYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
CYBERSECURITY INFRASTRUCTURE AND SECURITY AUTOMATION
 
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
Vulnerabilities of Fingerprint Authentication Systems and Their SecuritiesVulnerabilities of Fingerprint Authentication Systems and Their Securities
Vulnerabilities of Fingerprint Authentication Systems and Their Securities
 
[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence[Bucharest] Attack is easy, let's talk defence
[Bucharest] Attack is easy, let's talk defence
 

Plus de Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesKim Hammar
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHKim Hammar
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlKim Hammar
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Kim Hammar
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Kim Hammar
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingKim Hammar
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Kim Hammar
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Kim Hammar
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...Kim Hammar
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedKim Hammar
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Kim Hammar
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhetKim Hammar
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingKim Hammar
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupKim Hammar
 

Plus de Kim Hammar (14)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
A Game Theoretic Analysis of Intrusion Detection in Access Control Systems - ...
 
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against HeartbleedReinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed
 
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
Learning Intrusion Prevention Policies through Optimal Stopping - CNSM2021
 
Självlärande system för cybersäkerhet
Självlärande system för cybersäkerhetSjälvlärande system för cybersäkerhet
Självlärande system för cybersäkerhet
 
Learning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal StoppingLearning Intrusion Prevention Policies Through Optimal Stopping
Learning Intrusion Prevention Policies Through Optimal Stopping
 
MuZero - ML + Security Reading Group
MuZero - ML + Security Reading GroupMuZero - ML + Security Reading Group
MuZero - ML + Security Reading Group
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Dernier (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Self-learning systems for cyber security

  • 1. 1/30 Self-Learning Systems for Cyber Security NSE Seminar Kim Hammar & Rolf Stadler kimham@kth.se & stadler@kth.se Division of Network and Systems Engineering KTH Royal Institute of Technology April 9, 2021
  • 4. 4/30 Challenges: Evolving and Automated Attacks I Challenges: I Evolving & automated attacks I Complex infrastructures Attacker Client 1 Client 2 Client 3 Defender R1
  • 5. 4/30 Goal: Automation and Learning I Challenges I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods Attacker Client 1 Client 2 Client 3 Defender R1
  • 6. 4/30 Approach: Game Model & Reinforcement Learning I Challenges: I Evolving & automated attacks I Complex infrastructures I Our Goal: I Automate security tasks I Adapt to changing attack methods I Our Approach: I Model network attack and defense as games. I Use reinforcement learning to learn policies. I Incorporate learned policies in self-learning systems. Attacker Client 1 Client 2 Client 3 Defender R1
  • 7. 5/30 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero1 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling2 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory3 I Markov decision theory I =⇒ Many security operations involves strategic decision making 1 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 2 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 3 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 8. 5/30 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero4 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling5 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory6 I Markov decision theory I =⇒ Many security operations involves strategic decision making 4 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 5 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 6 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 9. 5/30 State of the Art I Game-Learning Programs: I TD-Gammon, AlphaGo Zero7 , OpenAI Five etc. I =⇒ Impressive empirical results of RL and self-play I Attack Simulations: I Automated threat modeling8 , automated intrusion detection etc. I =⇒ Need for automation and better security tooling I Mathematical Modeling: I Game theory9 I Markov decision theory I =⇒ Many security operations involves strategic decision making 7 David Silver et al. “Mastering the game of Go without human knowledge”. In: Nature 550 (Oct. 2017), pp. 354–. url: http://dx.doi.org/10.1038/nature24270. 8 Pontus Johnson, Robert Lagerström, and Mathias Ekstedt. “A Meta Language for Threat Modeling and Attack Simulations”. In: Proceedings of the 13th International Conference on Availability, Reliability and Security. ARES 2018. Hamburg, Germany: Association for Computing Machinery, 2018. isbn: 9781450364485. doi: 10.1145/3230833.3232799. url: https://doi.org/10.1145/3230833.3232799. 9 Tansu Alpcan and Tamer Basar. Network Security: A Decision and Game-Theoretic Approach. 1st. USA: Cambridge University Press, 2010. isbn: 0521119324.
  • 10. 6/30 Our Work I Use Case: Intrusion Prevention I Our Method: I Emulating computer infrastructures I System identification and model creation I Reinforcement learning and generalization I Results: I Learning to Capture The Flag I Learning to Detect Network Intrusions I Conclusions and Future Work
  • 11. 7/30 Use Case: Intrusion Prevention I A Defender owns an infrastructure I Consists of connected components I Components run network services I Defender defends the infrastructure by monitoring and patching I An Attacker seeks to intrude on the infrastructure I Has a partial view of the infrastructure I Wants to compromise specific components I Attacks by reconnaissance, exploitation and pivoting Attacker Client 1 Client 2 Client 3 Defender R1
  • 12. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 13. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 14. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 15. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 16. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 17. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 18. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 19. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 20. 8/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation & System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning & Generalization Policy evaluation & Model estimation Automation & Self-learning systems
  • 21. 9/30 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 22. 9/30 Emulation System Σ Configuration Space σi * * * 172.18.4.0/24 172.18.19.0/24 172.18.61.0/24 Emulated Infrastructures R1 R1 R1 Emulation A cluster of machines that runs a virtualized infrastructure which replicates important functionality of target systems. I The set of virtualized configurations define a configuration space Σ = hA, O, S, U, T , Vi. I A specific emulation is based on a configuration σi ∈ Σ.
  • 23. 10/30 Emulation: Execution Times of Replicated Operations 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Normalized Frequency Action execution times (costs) |N| = 25 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 50 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 75 0 500 1000 1500 2000 Time Cost (s) 10−5 10−4 10−3 10−2 Action execution times (costs) |N| = 100 I Fundamental issue: Computational methods for policy learning typically require samples on the order of 100k − 10M. I =⇒ Infeasible to optimize in the emulation system
  • 24. 11/30 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 25. 11/30 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 26. 11/30 From Emulation to Simulation: System Identification R1 m1 m2 m3 m4 m5 m6 m7 m1,1 . . . m1,k h i m5,1 . . . m5,k h i m6,1 . . . m6,k h i m2,1 . . . m2,k         m3,1 . . . m3,k         m7,1 . . . m7,k         m4,1 . . . m4,k         Emulated Network Abstract Model POMDP Model hS, A, P, R, γ, O, Zi a1 a2 a3 . . . s1 s2 s3 . . . o1 o2 o3 . . . I Abstract Model Based on Domain Knowledge: Models the set of controls, the objective function, and the features of the emulated network. I Defines the static parts a POMDP model. I Dynamics Model (P, Z) Identified using System Identification: Algorithm based on random walks and maximum-likelihood estimation. M(b0 |b, a) , n(b, a, b0) P j0 n(s, a, j0)
  • 27. 12/30 System Identification: Estimated Dynamics Model 172.18.4.2 Connections Failed Logins Accounts Online Users Logins Processes 172.18.4.3 172.18.4.10 172.18.4.21 172.18.4.79 IDS Alerts Alert Priorities Severe Alerts Warning Alerts Estimated Emulation Dynamics
  • 28. 13/30 System Identification: Estimated Dynamics Model −5 0 5 10 15 20 # New TCP/UDP Connections 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] New TCP/UDP Connections 0 10 20 30 40 50 # New Failed Login Attempts 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] New Failed Login Attempts −30 −20 −10 0 10 20 30 # Created User Accounts 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] Created User Accounts −0.04 −0.02 0.00 0.02 0.04 # New Logged in Users 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] New Logged in Users −0.04 −0.02 0.00 0.02 0.04 # Login Events 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] New Login Events 0 20 40 60 80 100 120 140 # Created Processes 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] Created Processes Node IP: 172.18.4.2 (b0, a0) (b1, a0) ...
  • 29. 14/30 System Identification: Estimated Dynamics Model 0 20 40 60 80 100 120 # IDS Alerts 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] IDS Alerts 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 # Severe IDS Alerts 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] Severe IDS Alerts 0 20 40 60 80 100 # Warning IDS Alerts 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] Warning IDS Alerts 0 50 100 150 200 250 # IDS Alert Priorities 0.0 0.2 0.4 0.6 0.8 1.0 P[·|(b i , a i )] IDS Alert Priorities IDS Dynamics (b0, a0) (b1, a0) ...
  • 30. 15/30 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 31. 15/30 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 32. 15/30 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E hPT t=0 γt rt+1 i I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N|·m·(m+1) I Large action space |A| = |N| · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization Agent Environment at st+1 rt+1
  • 33. 15/30 Policy Optimization in the Simulation System using Reinforcement Learning I Goal: I Approximate π∗ = arg maxπ E PT t=0 γt rt+1 I Learning Algorithm: I Represent π by πθ I Define objective J(θ) = Eo∼ρπθ ,a∼πθ [R] I Maximize J(θ) by stochastic gradient ascent with gradient ∇θJ(θ) = Eo∼ρπθ ,a∼πθ [∇θ log πθ(a|o)Aπθ (o, a)] I Domain-Specific Challenges: I Partial observability I Large state space |S| = (w + 1)|N |·m·(m+1) I Large action space |A| = |N | · (m + 1) I Non-stationary Environment due to presence of adversary I Generalization I Finding Effective Security Strategies through Reinforcement Learning and Self-Playa a Kim Hammar and Rolf Stadler. “Finding Effective Security Strategies through Reinforcement Learning and Self-Play”. In: International Conference on Network and Service Management (CNSM 2020) (CNSM 2020). Izmir, Turkey, Nov. 2020. Agent Environment at st+1 rt+1
  • 34. 16/30 Our Method for Finding Effective Security Strategies s1,1 s1,2 s1,3 . . . s1,n s2,1 s2,2 s2,3 . . . s2,n . . . . . . . . . . . . . . . Emulation System Real world Infrastructure Model Creation System Identification Policy Mapping π Selective Replication Policy Implementation π Simulation System Reinforcement Learning Generalization Policy evaluation Model estimation Automation Self-learning systems
  • 35. 17/30 Learning Capture-the-Flag Strategies: Target Infrastructure I Topology: I 32 Servers, 1 IDS (Snort), 3 Clients I Services I 1 SNMP, 1 Cassandra, 2 Kafka, 8 HTTP, 1 DNS, 1 SMTP, 2 NTP, 5 IRC, 1 Teamspeak, 1 MongoDB, 1 Samba, 1 RethinkDB, 1 CockroachDB, 2 Postgres, 3 FTP, 15 SSH, 2 FTP I Vulnerabilities I 2 CVE-2010-0426, 2 CVE-2010-0426, 1 CVE-2015-3306, 1 CVE-2015-5602, 1 CVE-2016-10033, 1 CVE-2017-7494, 1 CVE-2014-6271 I 5 Brute-force vulnerabilities I Operating Systems I 14 Ubuntu-20, 9 Ubuntu-14, 1 Debian 9:2, 2 Debian Wheezy, 5 Debian Jessie, 1 Kali I Traffic I FTP, SSH, IRC, SNMP, HTTP, Telnet, IRC, Postgres, MongoDB, Samba I curl, ping, tracerotue, nmap.. Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 36. 18/30 Learning Capture-the-Flag Strategies: System Model 1/3 I A hacker (pentester) has T time-periods to collect flags hidden in the infrastructure. I The hacker is located at a dedicated starting position N0 and can connect to a gateway that exposes public-facing services in the infrastructure. I The hacker has a pre-defined set (cardinality ∼ 200) of network/shell commands available. Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 37. 19/30 Learning Capture-the-Flag Strategies: System Model 2/3 I By execution of commands, the hacker collects information I Open ports, failed/successful exploits, vulnerabilities, costs, OS, ... I Sequences of commands can yield shell-access to nodes I Given shell access, the hacker can search for flags I Associated with each command is a cost c (execution time) and noise n (IDS alerts). I The objective is to capture all flags with the minimal cost within the fixed time horizon T. What strategy achieves this end? Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 38. 19/30 Learning Capture-the-Flag Strategies: System Model 2/3 I By execution of commands, the hacker collects information I Open ports, failed/successful exploits, vulnerabilities, costs, OS, ... I Sequences of commands can yield shell-access to nodes I Given shell access, the hacker can search for flags I Associated with each command is a cost c (execution time) and noise n (IDS alerts). I The objective is to capture all flags with the minimal cost within the fixed time horizon T. What strategy achieves this end? Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 39. 20/30 Learning Capture-the-Flag Strategies: System Model 3/3 I Contextual Stochastic CTF with Partial Information I Model infrastructure as a graph G = hN, Ei I There are k flags at nodes C ⊆ N I Ni ∈ N has a node state si of m attributes I Network state s = {sA, si | i ∈ N} ∈ R|N|×m+|N| I Hacker observes oA ⊂ s I Action space: A = {aA 1 , . . . , aA k }, aA i (commands) I ∀(b, a) ∈ A × S, there is a probability ~ w A,(x) i,j of failure a probability of detection ϕ(det(si ) · n A,(x) i,j ) I State transitions s → s0 are decided by a discrete dynamical system s0 = F(s, a) I Exact dynamics (F, cA , nA , wA , det(·), ϕ(·)), are unknown to us! N0, ~ S0 N1, ~ S1 N2, ~ S2 N3, ~ S3 N4, ~ S4 N5, ~ S5 N6, ~ S6 N7, ~ S7 ~ cA 2,3, ~ nA 2,3, ~ wA 2,3 ~ cA 3,2, ~ nA 3,2, ~ wA 3,2 ~ cA 2,1, ~ nA 2,1, ~ wA 2,1 ~ cA 1,2, ~ nA 1,2, ~ wA 1,2 ~ c A 0,1 , ~ n A 0,1 , ~ w A 0,1 ~ cA 0,2, ~ nA 0,2, ~ wA 0,2 ~ cA 3,6, ~ nA 3,6, ~ wA 3,6 ~ cA 6,7, ~ nA 6,7, ~ wA 6,7 ~ cA 7,4, ~ nA 7,4, ~ wA 7,4 ~ c A 4 ,6 , ~ n A 4 ,6 , ~ w A 4 ,6 ~ c A 4 ,5 , ~ n A 4 ,5 , ~ w A 4 ,5 ~ c A 5 ,4 , ~ n A 5 ,4 , ~ w A 5 ,4 ~ cA 1,5, ~ nA 1,5, ~ wA 1,5 ~ c A 5,2 , ~ n A 5,2 , ~ w A 5,2 ~ c A 2, 4 , ~ n A 2, 4 , ~ w A 2, 4 ~ c A 1,4 , ~ n A 1,4 , ~ w A 1,4 Si ∈ Rm, ~ cA i,j ∈ Rk + ~ nA i,j ∈ Rk +, ~ wA i,j ∈ [0, 1]k ~ cD ∈ Rl +, ~ wD ∈ [0, 1]l Graphical Model.
  • 40. 20/30 Learning Capture-the-Flag Strategies: System Model 3/3 I Contextual Stochastic CTF with Partial Information I Model infrastructure as a graph G = hN, Ei I There are k flags at nodes C ⊆ N I Ni ∈ N has a node state si of m attributes I Network state s = {sA, si | i ∈ N} ∈ R|N|×m+|N| I Hacker observes oA ⊂ s I Action space: A = {aA 1 , . . . , aA k }, aA i (commands) I ∀(b, a) ∈ A × S, there is a probability ~ w A,(x) i,j of failure a probability of detection ϕ(det(si ) · n A,(x) i,j ) I State transitions s → s0 are decided by a discrete dynamical system s0 = F(s, a) I Exact dynamics (F, cA , nA , wA , det(·), ϕ(·)), are unknown to us! N0, ~ S0 N1, ~ S1 N2, ~ S2 N3, ~ S3 N4, ~ S4 N5, ~ S5 N6, ~ S6 N7, ~ S7 ~ cA 2,3, ~ nA 2,3, ~ wA 2,3 ~ cA 3,2, ~ nA 3,2, ~ wA 3,2 ~ cA 2,1, ~ nA 2,1, ~ wA 2,1 ~ cA 1,2, ~ nA 1,2, ~ wA 1,2 ~ c A 0,1 , ~ n A 0,1 , ~ w A 0,1 ~ cA 0,2, ~ nA 0,2, ~ wA 0,2 ~ cA 3,6, ~ nA 3,6, ~ wA 3,6 ~ cA 6,7, ~ nA 6,7, ~ wA 6,7 ~ cA 7,4, ~ nA 7,4, ~ wA 7,4 ~ c A 4 ,6 , ~ n A 4 ,6 , ~ w A 4 ,6 ~ c A 4 ,5 , ~ n A 4 ,5 , ~ w A 4 ,5 ~ c A 5 ,4 , ~ n A 5 ,4 , ~ w A 5 ,4 ~ cA 1,5, ~ nA 1,5, ~ wA 1,5 ~ c A 5,2 , ~ n A 5,2 , ~ w A 5,2 ~ c A 2, 4 , ~ n A 2, 4 , ~ w A 2, 4 ~ c A 1,4 , ~ n A 1,4 , ~ w A 1,4 Si ∈ Rm, ~ cA i,j ∈ Rk + ~ nA i,j ∈ Rk +, ~ wA i,j ∈ [0, 1]k ~ cD ∈ Rl +, ~ wD ∈ [0, 1]l Graphical Model.
  • 41. 21/30 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 Avg Episode Regret Episodic regret Generated Simulation Test Emulation Env Train Emulation Env lower bound π∗ Learning curves (simulation and emulation) of our proposed method. Attacker Client 1 Client 2 Client 3 Defender R1 Evaluation infrastructure.
  • 42. 22/30 Learning Capture-the-Flag Strategies 0 50 100 150 200 250 300 350 400 # Iteration −0.5 0.0 0.5 1.0 Episodic rewards 0 50 100 150 200 250 300 350 400 # Iteration 0 5 10 15 20 Episodic regret 0 50 100 150 200 250 300 350 400 # Iteration 4 6 8 10 12 14 Episodic steps Configuration 1 Configuration 2 Configuration 3 π∗ R1 alerts Gateway 172.18.4.0/24 R1 alerts Gateway 172.18.3.0/24 R1 Gateway alerts 172.18.2.0/24 Application server Intrusion detection system R1 Gateway Access switch Flag Traffic generator Configuration 1 Configuration 3 Configuration 2
  • 43. 23/30 Learning to Detect Network Intrusions: Target Infrastructure I Topology: I 6 Servers, 1 IDS (Snort), 3 Clients I Services I 3 SSH, 2 HTTP, 1 DNS, 1 Telnet, 1 FTP, 1 MongoDB, 2 SMTP, 1 Tomcat, 1 Teamspeak3, 1 SNMP, 1 IRC, 1 Postgres, 1 NTP I Vulnerabilities I 1 CVE-2010-0426, 3 Brute-force vulnerabilities I Operating Systems I 4 Ubuntu-20, 1 Ubuntu-14, 1 Kali I Traffic I FTP, SSH, IRC, SNMP, HTTP, Telnet, IRC, Postgres, MongoDB, I curl, ping, tracerotue, nmap.. Attacker Client 1 Client 2 Client 3 Defender R1 Evaluation infrastructure.
  • 44. 24/30 Learning to Detect Network Intrusions: System Model (1/3) I An admin should manage the infrastructure for T time-periods. I The admin can monitor the infrastructure to get a belief about it’s state bt I b1, . . . , bT−1 can be assumed to be generated from some unknown distribution ϕ. I If the admin suspects that the infrastructure is being intruded based on bt, he can suspend the suspicious user/traffic. Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 45. 24/30 Learning to Detect Network Intrusions: System Model (1/3) I An admin should manage the infrastructure for T time-periods. I The admin can monitor the infrastructure to get a belief about it’s state bt I b1, . . . , bT−1 can be assumed to be generated from some unknown distribution ϕ. I If the admin suspects that the infrastructure is being intruded based on bt, he can suspend the suspicious user/traffic. Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 46. 25/30 Learning to Detect Network Intrusions: System Model (2/3) I Suspending traffic from a true intrusion yields a reward r (salary bonus) I Not suspending traffic of a true intrusion, incurs a cost c (admin is fired) I Suspending traffic of a false intrusion, incurs a cost of o (breaking the SLA) I The objective is to to decide an optimal response for suspending network traffic. What strategy achieves this end? Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 47. 25/30 Learning to Detect Network Intrusions: System Model (2/3) I Suspending traffic from a true intrusion yields a reward r (salary bonus) I Not suspending traffic of a true intrusion, incurs a cost c (admin is fired) I Suspending traffic of a false intrusion, incurs a cost of o (breaking the SLA) I The objective is to to decide an optimal response for suspending network traffic. What strategy achieves this end? Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 48. 26/30 Learning to Detect Network Intrusions: System Model (3/3) I Optimal Stopping Problem I Action space A = {STOP, CONTINUE} I Belief state space B ∈ R8+10·m I A belief state b ∈ B contains relevant metrics to detect intrusions I Alerts from IDS, Entries in /var/log/auth, logged in users, TCP connections, processes, ... I Reward function R I r(bt, STOP, st) = 1intrusion β ti I β is a positive constant and ti is the number of nodes compromised by the attacker I =⇒ incentive to detect intrusion early. Attacker Client 1 Client 2 Client 3 Defender R1 Target infrastructure.
  • 49. 27/30 Structural Properties of the Optimal Policy I Assumptions: Always an intrusion before T, f (bt): probability of intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t. I Claim: Optimal policy is a threshold based policy I Necessary condition for optimality (Bellman): ut(bt) = sup a rt(bt, a) + X b0∈B pt(b0 |bt, a)ut+1(b0 , a) # (1) I Thus I have that it is optimal to stop at state bt iff f (bt) · β ti ≥ X b0∈B ϕ(b0 )ut+1(b0 ) (2) I Stopping threshold αt: αt , ti β X b0∈B ϕ(b0 )ut+1(b0 ) (3)
  • 50. 27/30 Structural Properties of the Optimal Policy I Assumptions: Always an intrusion before T, f (bt): probability of intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t. I Claim: Optimal policy is a threshold based policy I Necessary condition for optimality (Bellman): ut(bt) = sup a rt(bt, a) + X b0∈B pt(b0 |bt, a)ut+1(b0 , a) # (4) = max f (bt) · β ti , X b0∈B ϕ(b0 )ut+1(b0 ) # (5) I Thus I have that it is optimal to stop at state bt iff f (bt) · β ti ≥ X b0∈B ϕ(b0 )ut+1(b0 ) (6) I Stopping threshold αt: αt , ti β X b0∈B ϕ(b0 )ut+1(b0 ) (7)
  • 51. 27/30 Structural Properties of the Optimal Policy I Assumptions: Always an intrusion before T, f (bt): probability of intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t. I Claim: Optimal policy is a threshold based policy I Necessary condition for optimality (Bellman): ut(bt) = sup a rt(bt, a) + X b0∈B pt(b0 |bt, a)ut+1(b0 , a) # (8) = max f (bt) · β ti , X b0∈B ϕ(b0 )ut+1(b0 ) # (9) I Thus I have that it is optimal to stop at state bt iff f (bt) · β ti ≥ X b0∈B ϕ(b0 )ut+1(b0 ) (10) I Stopping threshold αt: αt , ti β X b0∈B ϕ(b0 )ut+1(b0 ) (11)
  • 52. 27/30 Structural Properties of the Optimal Policy I Assumptions: Always an intrusion before T, f (bt): probability of intrusion given bt, bt and p are Markov, f (bt) is non-decreasing in t. I Claim: Optimal policy is a threshold based policy I Necessary condition for optimality (Bellman): ut(bt) = sup a rt(bt, a) + X b0∈B pt(b0 |bt, a)ut+1(b0 , a) # (12) = max f (bt) · β ti , X b0∈B ϕ(b0 )ut+1(b0 ) # (13) I Thus I have that it is optimal to stop at state bt iff f (bt) · β ti ≥ X b0∈B ϕ(b0 )ut+1(b0 ) (14) I Stopping threshold αt: αt , ti β X b0∈B ϕ(b0 )ut+1(b0 ) (15)
  • 53. 28/30 Learning to Detect Network Intrusions 0 25 50 75 100 125 150 175 200 # Iteration −2 0 2 4 6 8 10 Avg Episode Rewards Episodic rewards πθ simulation πθ emulation Snort-Severe simulation Snort-warning simulation Snort-critical simulation /var/log/auth simulation Snort-Severe emulation Snort-warning emulation Snort-critical emulation /var/log/auth emulation upper bound π∗ Learning curves (simulation and emulation) of our proposed method. Attacker Client 1 Client 2 Client 3 Defender R1 Evaluation infrastructure.
  • 54. 29/30 Learning to Detect Network Intrusions 0 25 50 75 100 125 150 175 200 # Iteration 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Fraction TP (intrusion detected), FP (early stopping), FN (intrusion) True Positive (TP) False Positive (FP) False Negative (FN) max Trade-off between detection and false positives
  • 55. 30/30 Conclusions Future Work I Conclusions: I We develop a method to find effective strategies for intrusion prevention I (1) emulation system; (2) system identification; (3) simulation system; (4) reinforcement learning and (5) domain randomization and generalization. I We show that self-learning can be successfully applied to network infrastructures. I Self-play reinforcement learning in Markov security game I Key challenges: stable convergence, sample efficiency, complexity of emulations, large state and action spaces I Our research plans: I Improving the system identification algorithm generalization I Evaluation on real world infrastructures