Contenu connexe Similaire à Base Rate Fallacy Sira Con 2012 05 (20) Base Rate Fallacy Sira Con 2012 051. Patrick Florer
Risk Centric Security, Inc.
www.riskcentricsecurity.com
Authorized reseller of ModelRisk from Vose Software
Risk Centric Security, Inc. Confidential and Proprietary .
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
Risk Analysis for the 21st Century®
2. Risk Centric Security, Inc. Confidential and Proprietary .
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
Patrick Florer has worked in information technology for
32 years. In addition, he worked a parallel track in
medical outcomes research, analysis, and the creation of
evidence-based guidelines for medical treatment. His
roles have included IT operations, programming, and
systems analysis. From 1986 until now, he has worked as
an independent consultant, helping customers with
strategic development, analytics, risk analysis, and
decision analysis. He is a cofounder of Risk Centric
Security and currently serves as Chief Technology Officer.
3. What is the Base Rate Fallacy?
What are fourfold tables?
How do fourfold tables work?
How can fourfold tables help solve information
security problems?
How can the use of Monte Carlo simulation
improve the use of fourfold tables?
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
4. A technology under evaluation claims:
95% accuracy in detecting malicious traffic
15% false positive rate
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
5. What is the probability that a sample
identified as malicious is really malicious?
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
6. What is the probability that a sample
identified as malicious is really malicious?
Without knowing, or being able to estimate,
the base rate in the sample or population, you
cannot answer the question!
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
7. Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
Exists/True
Does not exist/
Not True/False
Identified/Detected as
Existing/True/Positive
True Positive (TP)
(+ +)
False Positive (FP)
(- +)
Identified/Detected as
Not
Existing/False/Negative
True Negative (TN)
(+ -)
False Negative (FN)
(- -)
8. A fourfold table, also called a 2 x 2 table, is a
four cell table (2 rows x 2 columns) based
upon two sets of dichotomous or ”yes/no”
facts.
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
9. The two sets of facts could be:
Something that exists or is true, or doesn’t
exist/isn’t true, and
Something else related to #1 that exists or is
true, or doesn’t exist/isn’t true, including
“something” that attempts to identify/detect
whether #1 exists/is true or #1 does not
exist/is not true.
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
10. The “something” that exists or doesn’t exist could be
a disease, a virus or worm, malware, exploit code, or
a malicious packet: i.e: you have a disease or you
don’t; a piece of code is malicious or it isn’t, etc.
The “something else” that “identifies/detects” could
be a medical diagnostic test, anti-virus/anti-malware
software, IDS/IPS systems, etc. The diagnostic test or
software either correctly identifies the disease, virus,
or malware, or it doesn’t.
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
11. True Positive: “something” does exist/is true and is
correctly identified as existing/true.
False Positive: “something” does not exist/is not true,
but is incorrectly identified as existing/true.
False Negative: “something” does exist/is true, but is
incorrectly identified as not existing/not true.
True Negative: “something” does not exist/is false, and is
correctly identified as not existing/false.
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
12. Using the previous example:
95% accuracy in detecting malicious traffic
15% false positive rate
And, assuming that 3% of all traffic is malicious
(prevalence)
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
13. Out of 1 million packets:
3% are malicious = 30,000
97% are non-malicious = 970,000
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
14. Out of 30,000 malicious packets:
95% are correctly identified as malicious =
28,500 (True Positive)
5% are incorrectly identified as harmless =
1,500 (False Negative)
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
15. Out of 970,000 non-malicious packets:
15% are incorrectly identified as malicious =
145,500 (False Positive)
85% are correctly identified as non-
malicious = 824,500 (True Negative)
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
16. Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
True False
Positive True Positive
(TP)
28,500
False Positive
(FP)
145,500
Negative True Negative
(TN)
824,500
False Negative
(FN)
1,500
17. What is the probability that a packet identified
as malicious is really malicious?
P(mal) = TP / (TP + FP)
= 28,500 / (28,500 + 145,500)
= 16.3%
What happened to the 95% accuracy rate?
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
18. What is the probability that a packet identified
as non-malicious is really non-malicious?
P(<>mal) = TN / (TN + FN)
= 824,500 / (824,500 + 1,500)
= 98.8%
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
19. Some things to remember:
The numbers in the four cells must add up
to 100% of the total number being analyzed
(1M in this example)
As the base rate approaches 100%, the base
rate fallacy ceases to apply
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
20. What’s the difference?
Prevalence: cross-sectional, how much is
out there right now?
Incidence: longitudinal, a proportion of
new cases found during a time period
Both prevalence and incidence can be
expressed as rates.
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
21. How can the use of Monte Carlo simulation
improve the use of fourfold tables?
Examples in Excel
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
22. Thank you !
Patrick Florer
CTO and Co-founder
Risk Centric Security, Inc
patrick@riskcentricsecurity.com
214.828.1172
Authorized reseller of ModelRisk from Vose Software
Risk Centric Security, Inc. Confidential and Proprietary.
Copyright © 2012 Risk Centric Security, Inc . All rights reserved.
Risk Analysis for the 21st Century ®