2. What is Adversarial Learning?
• What is spam filtering?
• What is intrusion detection?
• What is terrorism detection?
• What would these have to do with
classification in particular?
3. Defining Adversarial Learning
• Adversarial machine learning is a research
field that lies at the intersection of machine
learning and computer security. It aims to
enable the safe adoption of machine learning
techniques in adversarial settings like spam
filtering, malware detection and biometric
recognition.
4. Motivation for the presentation
• Previous works- unrealistic assumption- attacker
has perfect knowledge of the classifier.
• Introduction of the adversarial classifier reverse
engineering (ACRE) learning problem
• Presentation of efficient algorithms for reverse
engineering linear classifiers and the role of
active experimentation in adversarial attacks.
5. Exploring the Problems - I
• As classifiers become more widely deployed,
adversaries are actively modifying their
behavior to avoid detection.
• For example: senders of junk email
6. Exploring the Problems – II
• Dalvi et al. – Anticipation of attacks by
computation of the adversary’s optimal
strategy. Adversary has perfect knowledge of
classifier.
• Unfortunately, rarely true in practice.
Adversaries must learn about the classifier
using prior knowledge, observation and
experimentation.
7. The Underlying Solution
• Exploring the role of active experimentation in
adversarial attacks.
• Identification of high-quality instances by
adversaries that are not labeled malicious with a
reasonable (polynomial) number of queries.
• Treating the problem as adversarial classifier
reverse engineering (ACRE) learning problem
8. Introducing the Learning Problems- I
• Active Learning Problem:
Semi-supervised machine learning in which a learning
algorithm is able to interactively query the user to
obtain the desired outputs at new data points.
For example: YOU
• Setup: Given existing knowledge, want to choose where to collect
more data
– Access to cheap unlabelled points
– Make a query to obtain expensive label
– Want to find labels that are “informative”
• Output: Classifier / predictor trained on less labeled data
9. Introducing the Learning Problems- II
• PAC Model:
Task of successful learning of an unknown target concept
should entail obtaining, with high probability, a hypothesis,
that is a good approximation of it.
Algorithm:
• Alg is given sample S = {(x,y)} presumed to be drawn from
some distribution D over instance space X, labeled by some
target function f.
• Alg does optimization over S to produce some hypothesis h.
• Goal is for h to be close to f over D.
• Allow failure with small prob d (to allow for chance that S is
not representative).
10. Introducing the Learning Problems- III
• ACRE Problem: Task of learning sufficient
information about a classifier to construct
adversarial attacks.
• We would discuss the algorithms in the
following slides.
11. Defining the Problem
X1
X2 x
X1
X2 x
+
-
X1
X2
Instance space Classifier
Adversarial cost
function
c(x): X {+,}
c C, concept class
(e.g., linear classifier)
X = {X1, X2, …, Xn}
Each Xi is a feature
(e.g., a word)
Instances, x X
(e.g., emails)
a(x): X R
a A
(e.g., more legible
spam is better)
12. Adversarial Classification Reverse
Engineering (ACRE)
• Task: Minimize a(x) subject to c(x) =
• Given:
X1
X2
? ??
?
?
?
?
?
-
+
–Full knowledge of a(x)
–One positive and one negative instance, x+ and x
–A polynomial number of membership queries
Within a factor of k
14. How is ACRE different?
• The ACRE learning problem differs
significantly from both the probably
approximately correct (PAC) model of
learning and active learning in that the goal is
not to learn the entire decision surface, there
is no assumed distribution governing the
instances and success is measured relative to
a cost model for the adversary.
15. What we assume
The adversary:
• Can issue membership queries to the classifier
for arbitrary instances
• Has access to an adversarial cost function a(x)
that maps instances to non-negative real
numbers.
• Provided with one positive instance, x+, and
one negative instance, x−.
16. Linear Classifiers with
Continuous features
ACRE learnable within a factor of (1+)
under linear cost functions
Proof sketch
Only need to change the highest weight/cost feature
We can efficiently find this feature using line searches
in each dimension
X1
X2
xa
17. Linear Classifiers with
Boolean features
• Harder problem: can’t do line searches
• ACRE learnable within a factor of 2
if adversary has unit cost per change:
xa x-
wi wj wk wl wm
c(x)
18. Continuous Features – Theorem I
• Let c be a continuous linear classifier with
vector of weights w, such that the magnitude
of the ratio between two non-zero weights is
never less than δ(lower bound). Given positive
and negative instances x+ and x−, we can find
each weight within a factor of 1+ using a
polynomial number of queries.
22. Boolean Features – Theorems
• In a linear classifier with Boolean
features, determining if a sign witness
exists for a given feature is NP-complete.
• Boolean linear classifiers are ACRE 2-
learnable under uniform linear cost
functions.
24. Adaptation of ACRE Algorithm - I
Classifier Configuration:
Two Linear Classifiers:
• A naïve Bayes model
• A maximum entropy (maxent) model.
Adversary Configuration:
• Adversaries were english words from
dictionary which were classified into feature
lists as Dict, Freq, and Rand.
25. Adaptation of ACRE Algorithm - II
Iteratively reduce the cost in two ways:
1. Remove any unnecessary change: O(n)
2. Replace any two changes with one: O(n3)
xa y
wi wj wk wl
c(x)
wm
x-
xa y’
wi wj wk wl
c(x)
wp
27. Conclusion
ACRE Learning:
• Determines whether an adversary can
efficiently learn enough about a classifier to
minimize the cost of defeating it.
• Algorithm performed quite well in spam
filtering, easily exceeding the worst-case
bounds.
28. Future Work - I
• There is possibility to add different types of classifiers, cost
functions, and even learning scenarios and understanding
which scenarios can be hard.
• Under what conditions is ACRE learning robust to noisy
classifiers?
• What can be learned from passive observation alone, for
domains where issuing any test queries would be
prohibitively expensive?
• If the adversary does not know which features make up the
instance space, when can they be inferred?
29. Future Work - II
• Can a similar framework be applied to relational problems,
e.g. to reverse engineering collective classification?
• Moving beyond classification, under what circumstances
can adversaries reverse engineer regression functions, such
as car insurance rates?
• How do such techniques fare against a changing classifier,
such as a frequently retrained spam filter?
• Will the knowledge to defeat a classifier today be of any
use tomorrow?