SlideShare une entreprise Scribd logo
1  sur  8
Télécharger pour lire hors ligne
arXiv:1309.4275v2[cs.CR]18Sep2013
Detecting encrypted code
J¨arpe, E. and Wecksten, M.
September 19, 2013
Abstract
When a computer hard drive is confiscated in the conduct of a
police investigation the information contained is commonly encrypted
at such a level that brute force thechniques have to be used. This is too
time consuming if the hard drive is large, thus demanding methods to
make a smart distinction between likely encrypted and non-encrypted
code in order to use the cracking resources efficiently. The methods are
stopping rules according to the cutting edge of on-line change-point
detection. The performance of the methods is evaluated in terms of
conditional expected delay and predictive value having calibrated the
methods so that they all have the same expected time till false alarm.
1 Introduction
In modern IT forensics there is a challenge to crack an encrypted hard drive
that has been confiscated in the conduct of a police investigation. The
strength of the contemporary tools for such protection (e.g. True crypt) is at
such a level that password guessing (so-called brute force techniques) have to
be used. This may not be a problem if there is only one or a few blocks of
code are cracked within reasonable time.
However, the problem of cracking the encryption of computer code may
be further complicated if the contents of the hard drive have been deleted by
removing the pointers of the file allocation tables (so-called quick deletion).
Then the knowledge of what the formats of the contents are is not present.
The hard drive contents is merely a (possibly immense) sequence of blocks of
code that have to be attempted for cracking. Then it may well be completely
impossible to apply brute force technique on each block simply because there
are so many blocks and each password guessing procedure takes too much
time.
1
To this end a sieve discriminating between the blocks of most likely not
encrypted computer code and the blocks of suspiciously encrypted (and there-
fore more likely harmful) computer code is desired. After having picked out
the encrypted blocks one may proceed with brute force cracking in an effi-
cient manner according to some standard. A property of a crypto which is
commonly regarded as an indicator of quality is how evenly all characters
of the encryption alphabet is used. Thus, a measure for telling encryption
likelihood apart, is to quantify how uniformly the encryption characters are
distributed; in encrypted code the characters are more likely to be evenly
distributed than in text files, images, programs etc. One way of measuring
how evenly the characters are distributed is by considering the character fre-
quencies and expressing them by using the Pearson chi square statistic (see
e.g. Cox and Hinkley [1]). But of course one may also consider other aspects
of even distribution of the characters, such that they should not occur more
clustered or regularly than what would have been the case in a completely
random pattern.
This paper is all about development of such a procedure for distinguishing
between the code blocks which are less likely and those which are more likely
encrypted.
Previous work...
In Section 2...
2 Model
Observations of blocks of code c1, c2, c3, . . ., are made consecutively at time-
points t = 1, 2, 3, . . .. Some of these clusters may be subject to encryption.
Depending on the degree of fragmentation of the hard drive they may be
more likely to occur consecutively or not. In order to avoid tedious brute
force cracking all of the clusters, a method for separating the clusters that
are more likely encrypted from the clusters which are less likely encrypted is
desired.
Under the assumption that the characters constituting encrypted code
are more evenly distributed than the characters constituting clear text code,
the Barkman crypto indicator Ut (which is nothing but the well-known Pear-
son chi-squared statistic, see e.g. Cox and Hinkley [1], in the special case
with uniform distribution under the alternative hypothesis), may be used
to distinguish code which is more likely encrypted. Simply calculating this
statistic for all the clusters of the hard drive, and then brute force cracking
those with an alarming score of the Barkman crypto indicator is a possibility
if the hard drive is not too large.
2
If the hard drive is very large (it may of course not be only one physical
unit but rather a composition of many units and thus, possibly, very large) it
can be a practically impossible task to evaluate the Barkman crypto indicator
for all clusters. However, if the contents is likely to be little fragmented,
and accordingly clusters of the same files are more likely to be occurring
consecutively, then a monitoring algorithm, applied to the on-line calculation
of Barkman crypto indicator, can facilitate the search for a larger sequence
of consecutive clusters that are more likely encrypted. Then the situation is
more formally described as follows. Cluster observations, C1, C2, C3, . . ., are
to be made on-line, i.e. one by one. Now, assuming that the code up to some
random time-point t = τ − 1 is not encrypted and that the code from time
t = τ and on wards is, the distribution of the characters constituting C1, C2,
. . ., Cτ−1 are distributed in some other way than the characters of Cτ , Cτ+1,
. . . The problem is then to determine that the change from not encrypted
to encrypted code has occurred, as quickly after it has really happened and
as accurately (i.e. with as little delay) as possible.
3 Methods
For each block ct, where t = 1, 2, 3, . . ., the number Nt of characters and the
number Kt of distinct character kinds occurring in that block is counted. For
1 2 1 1 1 0 1 0 3 3
0 3 1 1 3 2 0 2 0 3
0 2 2 0 1 0 0 0 3 0
1 0 3 0 1 2 2 2 1 1
2 1 3 3 1 0 2 1 3 3
3 0 1 1 1 0 1 3 3 2
Figure 1: Code consisting of 60 characters and 4 character kinds (i.e. 0, 1,
2 and 3). Thus if this is cluster t, then Nt = 60 and Kt = 4 in this case.
all characters, count the observed freqencies oi,1 (i.e. the number of occur-
rences of character 1), ot,2 (i.e. the number of occurrences of character 2), . . .,
ot,Kt (i.e. the number of occurrences of character Kt). From these observed
frequencies the value of the Barkman crypto indicator (i.e. the Pearson chi-
square statistic in the special case when the distribution under the alternative
3
hypothesis is uniform)
Ut =
Kt
i=1
(KtOt,i − Nt)2
KtNt
(1)
According to the Central Limit theorem, Ut is approximately χ2
dis-
tributed when Kt and Nt are large. Thus, if we consider the problem of
monitoring the degree of fit to a uniform distribution (indicating how suspi-
ciously the code is to be encrypted) based on an on-line sequence of observa-
tions of the Barkman crypto indicator U1, U2, U3, . . . would approximatively
satisfy
Ut ∈
ψ2
(Kt − 1) om t < θ
χ2
(Kt − 1) om t ≥ θ
where χ2
(Kt − 1) is the chi square distribution with Kt − 1 och ψ2
is the
distribution of cX where X ∈ χ2
for some c > 1.
At each time t the Barkman crypto indicator, Ut, is calculated and fed to
a stopping rule which is an algorithm which can be defined as
T = min{t ≥ 1 : a(U1, U2, . . . , Ut) > C}.
Let us denote a(U1, U2, . . . , Ut) by at for short. Then the stopping rule T
may be described more extensively as an algorithm by the following.
Algorithm 1 For a certain threshold, C ∈ R, and alarm function, a (with
a convenient number of arguments), the stopping rule is:
1. Start: a1 = U1.
2. If a1 > C,
then we stop immediately: Goto 10.
else goto 3.
3. At time t ≥ 2: make a new observation Ut.
4. If at > C,
then we stop: Goto 10.
else increase t by 1 and goto 3.
5. Stop and return the stopping time: T = t.
Choosing the alarm function a differently renders different kinds of stopping
rules. Changing the threshold C will give the stopping rule different prop-
erties. High values of C will make the stopping rule more conservative and
less likely to stop, thus giving few false alarm, but also long expected delay
of motivated alarm, while lower values gives a more sensitive stopping rule
with shorter expected delay, but more likely to give false alarm.
4
4 Sufficient statistic
5 The change-point problem
6 Results
6.1 Stopping rules
In this paper we consider the Shewhart, CUSUM, Shiryaev and Roberts
methods. The Shewhart method (Shewhart [9]) is the stopping rule
TW = min{t ≥ 1 : Ut > C}.
The CUSUM method (Page [7]) is
TC = min{t ≥ 1 : at > C}
where
at =
0 if t = 0
max(0, at−1) + n ln c − c−1
2c
Ut if t = 1, 2, 3, . . .
The Shiryaev method (Shiryaev [10]) is
TY = min{t ≥ 1 : at > C}
where
at =
0 if t = 0
(1 + at−1) exp (n ln c − c−1
2c
Ut) if t = 1, 2, 3, . . .
6.2 Evaluation
In order to evaluate the stopping rules theoretically the threshold is chosen
so that the ARL0
= E(T | T < θ) = 100 (i.e. the expected time until there
is a false alarm is 100). Then aspects of motivated alarm are calculated and
compared. All calculations here are based on simulations. The samplesize
in the simulations of the ARL0
is 100 000 observations for each method and
size of shift c = 1.1, c = 1.2 and c = 1.3.
The first aspect of motivated alarm to be examined is the conditional
expected delay, CED(t) = E(T − θ | T ≥ θ = t). This was simulated with
a samplesize of 1 000 000 observations for each point in the plot of Figure 3.
This performance measure indicates how quickly a method signals alarm after
the change has happened.
5
E(T −θ | T ≥θ=t)
c = 1.1 c = 1.2 c = 1.3
Cusum
Shiryaev
Cusum
Shiryaev
Cusum
Shiryaev
5 10 15 20 25
8
9
10
11
Time
5 10 15 20 25
3.0
3.2
3.4
3.6
3.8
Time
5 10 15 20 25
1.50
1.55
1.60
1.65
1.70
1.75
1.80
Time
Figure 2: Conditional expected delay for the Cusum (red lines and circles)
and Shiryaev (blue lines and squares) methods for shift size c = 1.1 (left
plot), c = 1.2 (middle plot) and c = 1.3 (right plot).
P(θ≤t | T =t)
c = 1.1 c = 1.2 c = 1.3
Cusum
Cusum
0 5 10 15 20 25
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Time
0 5 10 15 20 25
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Time
0 5 10 15 20 25
0.5
0.6
0.7
0.8
0.9
1.0
Time
Figure 3: Predictive value for the Cusum (red, orange and brown lines and
circles) and Shiryaev (blue, light blue and turqoise lines and squares) methods
for shift size c = 1.1 (left plot), c = 1.2 (middle plot) and c = 1.3 (right plot).
6
The second aspect of motivated alarm is predictive value, PV (t, ν) =
P(θ ≤ t | T = t) where the change-point θ is assumed to be geometrically
distrubuted with parameter ν with the interpretation µ = P(θ = t | θ > t−1).
In each of the plots the sample size was 10 000 000 observations for each
method. The predictive value shows how much credibility one should assert
to a method in the case when the change-point occurs at time t.
6.3 Examples
7 Discussion
References
[1] D. Cox and D.V. Hinkley (1974) Theoretical Statistics, Chapman
& Hall.
[2] M. Fris´en (2003) Statistical surveillance. Optimality and methods. Int.
Stat. Rev., 71 (2), 403–434.
[3] M. Fris´en and P. Wessman (1999) Evaluations of likelihood ratio
methods for surveillance: differences and robustness. Comm. Statist.
Simulation Comput., 28 (3), 597–622.
[4] E. J¨arpe (1999) Surveillance of the interaction parameter in the Ising
model. Comm. Statist. Theory Methods, 28 (12), 3009–3027.
[5] E. J¨arpe (2002) Surveillance, environmental. In: Encyclopedia of Envi-
ronmetrics (eds. A.H. El-Shaarawi and W.W. Piegorsch), Wiley. 2150–
2153.
[6] E. J¨arpe and P. Wessman (2000) Some power aspects of methods
for detecting different shifts in the mean. Comm. Statist. Simulation
Comput., 29 (2), 633–646.
[7] E.S. Page (1954) Continuous inspection schemes. Biometrika, 41, 100–
115.
[8] S.W. Roberts (1959) Control chart tests based on geometric moving
averages. Technometrics, 42 (1), 239–250.
[9] W.A. Shewhart (1931), Economic Control of Quality Control, Rein-
hold Company, Princeton N.J.
7
[10] A.N. Shiryaev (1963) On Optimum Methods in Quickest Detection
Problems. Theory Probab. Appl., 8 (1), 22–46.
[11] M.S. Srivastava and Y. Wu (1993), Comparison of some EWMA,
CUSUM and Shiryaev-Roberts procedures for detecting a shift in the
mean, Ann. Statist., 21 (2), 645–670.
8

Contenu connexe

Tendances

Presentation on Cryptography_Based on IEEE_Paper
Presentation on Cryptography_Based on IEEE_PaperPresentation on Cryptography_Based on IEEE_Paper
Presentation on Cryptography_Based on IEEE_Paper
Nithin Cv
 
Twenty years of attacks on the rsa cryptosystem
Twenty years of attacks on the rsa cryptosystemTwenty years of attacks on the rsa cryptosystem
Twenty years of attacks on the rsa cryptosystem
linzi320
 
Rfid presentation in internet
Rfid presentation in internetRfid presentation in internet
Rfid presentation in internet
Ali Azarnia
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
butest
 

Tendances (19)

Unit 3
Unit 3Unit 3
Unit 3
 
Unit 2
Unit  2Unit  2
Unit 2
 
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALSBLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
 
Fn3410321036
Fn3410321036Fn3410321036
Fn3410321036
 
Presentation on Cryptography_Based on IEEE_Paper
Presentation on Cryptography_Based on IEEE_PaperPresentation on Cryptography_Based on IEEE_Paper
Presentation on Cryptography_Based on IEEE_Paper
 
Ch09
Ch09Ch09
Ch09
 
Sunanda cryptography ppt
Sunanda cryptography pptSunanda cryptography ppt
Sunanda cryptography ppt
 
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final DraftMathematics Research Paper - Mathematics of Computer Networking - Final Draft
Mathematics Research Paper - Mathematics of Computer Networking - Final Draft
 
Post quantum cryptography - thesis
Post quantum cryptography - thesisPost quantum cryptography - thesis
Post quantum cryptography - thesis
 
Dsouza
DsouzaDsouza
Dsouza
 
F010243136
F010243136F010243136
F010243136
 
Classical encryption techniques
Classical encryption techniquesClassical encryption techniques
Classical encryption techniques
 
THE KEY EXCHANGE CRYPTOSYSTEM USED WITH HIGHER ORDER DIOPHANTINE EQUATIONS
THE KEY EXCHANGE CRYPTOSYSTEM USED WITH HIGHER ORDER DIOPHANTINE EQUATIONSTHE KEY EXCHANGE CRYPTOSYSTEM USED WITH HIGHER ORDER DIOPHANTINE EQUATIONS
THE KEY EXCHANGE CRYPTOSYSTEM USED WITH HIGHER ORDER DIOPHANTINE EQUATIONS
 
Mincut_Placement_Final_Report
Mincut_Placement_Final_ReportMincut_Placement_Final_Report
Mincut_Placement_Final_Report
 
Twenty years of attacks on the rsa cryptosystem
Twenty years of attacks on the rsa cryptosystemTwenty years of attacks on the rsa cryptosystem
Twenty years of attacks on the rsa cryptosystem
 
Rfid presentation in internet
Rfid presentation in internetRfid presentation in internet
Rfid presentation in internet
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
6.hash mac
6.hash mac6.hash mac
6.hash mac
 

En vedette

Implementation and implications of a stealth hard drive backdoor
Implementation and implications of a stealth hard drive backdoorImplementation and implications of a stealth hard drive backdoor
Implementation and implications of a stealth hard drive backdoor
Gaetano Zappulla
 
Zen and the art of collecting and analyzing malware
Zen and the art of collecting and analyzing malwareZen and the art of collecting and analyzing malware
Zen and the art of collecting and analyzing malware
Gaetano Zappulla
 
Investigation into the processing of personal data for the whatsapp application
Investigation into the processing of personal data for the whatsapp applicationInvestigation into the processing of personal data for the whatsapp application
Investigation into the processing of personal data for the whatsapp application
Gaetano Zappulla
 
IOR Annual Report 2012
IOR Annual Report 2012IOR Annual Report 2012
IOR Annual Report 2012
Gaetano Zappulla
 
JPMorgan: The Euro area adjustment: about halfway there
JPMorgan: The Euro area adjustment: about halfway there JPMorgan: The Euro area adjustment: about halfway there
JPMorgan: The Euro area adjustment: about halfway there
Gaetano Zappulla
 

En vedette (8)

Lo IOR pubblica il suo Rapporto Annuale del 2012
Lo IOR pubblica il suo Rapporto Annuale del 2012Lo IOR pubblica il suo Rapporto Annuale del 2012
Lo IOR pubblica il suo Rapporto Annuale del 2012
 
Implementation and implications of a stealth hard drive backdoor
Implementation and implications of a stealth hard drive backdoorImplementation and implications of a stealth hard drive backdoor
Implementation and implications of a stealth hard drive backdoor
 
Apologo sull’onestà nel paese dei corrotti
Apologo sull’onestà nel paese dei corrottiApologo sull’onestà nel paese dei corrotti
Apologo sull’onestà nel paese dei corrotti
 
Zen and the art of collecting and analyzing malware
Zen and the art of collecting and analyzing malwareZen and the art of collecting and analyzing malware
Zen and the art of collecting and analyzing malware
 
RELAZIONE SULLA POLITICA DELL’INFORMAZIONE PER LA SICUREZZA 2012
RELAZIONE SULLA POLITICA DELL’INFORMAZIONE PER LA SICUREZZA 2012RELAZIONE SULLA POLITICA DELL’INFORMAZIONE PER LA SICUREZZA 2012
RELAZIONE SULLA POLITICA DELL’INFORMAZIONE PER LA SICUREZZA 2012
 
Investigation into the processing of personal data for the whatsapp application
Investigation into the processing of personal data for the whatsapp applicationInvestigation into the processing of personal data for the whatsapp application
Investigation into the processing of personal data for the whatsapp application
 
IOR Annual Report 2012
IOR Annual Report 2012IOR Annual Report 2012
IOR Annual Report 2012
 
JPMorgan: The Euro area adjustment: about halfway there
JPMorgan: The Euro area adjustment: about halfway there JPMorgan: The Euro area adjustment: about halfway there
JPMorgan: The Euro area adjustment: about halfway there
 

Similaire à Detecting crypto

Accelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPUAccelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPU
Davide Nardone
 

Similaire à Detecting crypto (20)

Cdma
CdmaCdma
Cdma
 
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream CiphersMultiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
 
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream CiphersMultiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
Multiple Dimensional Fault Tolerant Schemes for Crypto Stream Ciphers
 
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIORANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
ANALYSIS OF ELEMENTARY CELLULAR AUTOMATA CHAOTIC RULES BEHAVIOR
 
Modification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorialModification of some solution techniques of combinatorial
Modification of some solution techniques of combinatorial
 
Introduction to Algorithms
Introduction to AlgorithmsIntroduction to Algorithms
Introduction to Algorithms
 
NCM RB PAPER
NCM RB PAPERNCM RB PAPER
NCM RB PAPER
 
Data Communications- Unit-4.pptx
Data Communications- Unit-4.pptxData Communications- Unit-4.pptx
Data Communications- Unit-4.pptx
 
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
 
Quantum cryptography
Quantum cryptographyQuantum cryptography
Quantum cryptography
 
Accelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPUAccelerating Dynamic Time Warping Subsequence Search with GPU
Accelerating Dynamic Time Warping Subsequence Search with GPU
 
IRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA CryptographyIRJET- Enhanced Security using DNA Cryptography
IRJET- Enhanced Security using DNA Cryptography
 
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUEEFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
EFFICIENT DIGITAL ENCRYPTION ALGORITHM BASED ON MATRIX SCRAMBLING TECHNIQUE
 
V01 i010401
V01 i010401V01 i010401
V01 i010401
 
N byte error detecting and correcting code using reedsolomon
N byte error detecting and correcting code using reedsolomonN byte error detecting and correcting code using reedsolomon
N byte error detecting and correcting code using reedsolomon
 
N byte error detecting and correcting code using reed-solomon and cellular au...
N byte error detecting and correcting code using reed-solomon and cellular au...N byte error detecting and correcting code using reed-solomon and cellular au...
N byte error detecting and correcting code using reed-solomon and cellular au...
 
Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System Improving The Performance of Viterbi Decoder using Window System
Improving The Performance of Viterbi Decoder using Window System
 
Module 1.pptx
Module 1.pptxModule 1.pptx
Module 1.pptx
 
A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...A novel technique for speech encryption based on k-means clustering and quant...
A novel technique for speech encryption based on k-means clustering and quant...
 
How to time logic
How to time logicHow to time logic
How to time logic
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Dernier (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

Detecting crypto

  • 1. arXiv:1309.4275v2[cs.CR]18Sep2013 Detecting encrypted code J¨arpe, E. and Wecksten, M. September 19, 2013 Abstract When a computer hard drive is confiscated in the conduct of a police investigation the information contained is commonly encrypted at such a level that brute force thechniques have to be used. This is too time consuming if the hard drive is large, thus demanding methods to make a smart distinction between likely encrypted and non-encrypted code in order to use the cracking resources efficiently. The methods are stopping rules according to the cutting edge of on-line change-point detection. The performance of the methods is evaluated in terms of conditional expected delay and predictive value having calibrated the methods so that they all have the same expected time till false alarm. 1 Introduction In modern IT forensics there is a challenge to crack an encrypted hard drive that has been confiscated in the conduct of a police investigation. The strength of the contemporary tools for such protection (e.g. True crypt) is at such a level that password guessing (so-called brute force techniques) have to be used. This may not be a problem if there is only one or a few blocks of code are cracked within reasonable time. However, the problem of cracking the encryption of computer code may be further complicated if the contents of the hard drive have been deleted by removing the pointers of the file allocation tables (so-called quick deletion). Then the knowledge of what the formats of the contents are is not present. The hard drive contents is merely a (possibly immense) sequence of blocks of code that have to be attempted for cracking. Then it may well be completely impossible to apply brute force technique on each block simply because there are so many blocks and each password guessing procedure takes too much time. 1
  • 2. To this end a sieve discriminating between the blocks of most likely not encrypted computer code and the blocks of suspiciously encrypted (and there- fore more likely harmful) computer code is desired. After having picked out the encrypted blocks one may proceed with brute force cracking in an effi- cient manner according to some standard. A property of a crypto which is commonly regarded as an indicator of quality is how evenly all characters of the encryption alphabet is used. Thus, a measure for telling encryption likelihood apart, is to quantify how uniformly the encryption characters are distributed; in encrypted code the characters are more likely to be evenly distributed than in text files, images, programs etc. One way of measuring how evenly the characters are distributed is by considering the character fre- quencies and expressing them by using the Pearson chi square statistic (see e.g. Cox and Hinkley [1]). But of course one may also consider other aspects of even distribution of the characters, such that they should not occur more clustered or regularly than what would have been the case in a completely random pattern. This paper is all about development of such a procedure for distinguishing between the code blocks which are less likely and those which are more likely encrypted. Previous work... In Section 2... 2 Model Observations of blocks of code c1, c2, c3, . . ., are made consecutively at time- points t = 1, 2, 3, . . .. Some of these clusters may be subject to encryption. Depending on the degree of fragmentation of the hard drive they may be more likely to occur consecutively or not. In order to avoid tedious brute force cracking all of the clusters, a method for separating the clusters that are more likely encrypted from the clusters which are less likely encrypted is desired. Under the assumption that the characters constituting encrypted code are more evenly distributed than the characters constituting clear text code, the Barkman crypto indicator Ut (which is nothing but the well-known Pear- son chi-squared statistic, see e.g. Cox and Hinkley [1], in the special case with uniform distribution under the alternative hypothesis), may be used to distinguish code which is more likely encrypted. Simply calculating this statistic for all the clusters of the hard drive, and then brute force cracking those with an alarming score of the Barkman crypto indicator is a possibility if the hard drive is not too large. 2
  • 3. If the hard drive is very large (it may of course not be only one physical unit but rather a composition of many units and thus, possibly, very large) it can be a practically impossible task to evaluate the Barkman crypto indicator for all clusters. However, if the contents is likely to be little fragmented, and accordingly clusters of the same files are more likely to be occurring consecutively, then a monitoring algorithm, applied to the on-line calculation of Barkman crypto indicator, can facilitate the search for a larger sequence of consecutive clusters that are more likely encrypted. Then the situation is more formally described as follows. Cluster observations, C1, C2, C3, . . ., are to be made on-line, i.e. one by one. Now, assuming that the code up to some random time-point t = τ − 1 is not encrypted and that the code from time t = τ and on wards is, the distribution of the characters constituting C1, C2, . . ., Cτ−1 are distributed in some other way than the characters of Cτ , Cτ+1, . . . The problem is then to determine that the change from not encrypted to encrypted code has occurred, as quickly after it has really happened and as accurately (i.e. with as little delay) as possible. 3 Methods For each block ct, where t = 1, 2, 3, . . ., the number Nt of characters and the number Kt of distinct character kinds occurring in that block is counted. For 1 2 1 1 1 0 1 0 3 3 0 3 1 1 3 2 0 2 0 3 0 2 2 0 1 0 0 0 3 0 1 0 3 0 1 2 2 2 1 1 2 1 3 3 1 0 2 1 3 3 3 0 1 1 1 0 1 3 3 2 Figure 1: Code consisting of 60 characters and 4 character kinds (i.e. 0, 1, 2 and 3). Thus if this is cluster t, then Nt = 60 and Kt = 4 in this case. all characters, count the observed freqencies oi,1 (i.e. the number of occur- rences of character 1), ot,2 (i.e. the number of occurrences of character 2), . . ., ot,Kt (i.e. the number of occurrences of character Kt). From these observed frequencies the value of the Barkman crypto indicator (i.e. the Pearson chi- square statistic in the special case when the distribution under the alternative 3
  • 4. hypothesis is uniform) Ut = Kt i=1 (KtOt,i − Nt)2 KtNt (1) According to the Central Limit theorem, Ut is approximately χ2 dis- tributed when Kt and Nt are large. Thus, if we consider the problem of monitoring the degree of fit to a uniform distribution (indicating how suspi- ciously the code is to be encrypted) based on an on-line sequence of observa- tions of the Barkman crypto indicator U1, U2, U3, . . . would approximatively satisfy Ut ∈ ψ2 (Kt − 1) om t < θ χ2 (Kt − 1) om t ≥ θ where χ2 (Kt − 1) is the chi square distribution with Kt − 1 och ψ2 is the distribution of cX where X ∈ χ2 for some c > 1. At each time t the Barkman crypto indicator, Ut, is calculated and fed to a stopping rule which is an algorithm which can be defined as T = min{t ≥ 1 : a(U1, U2, . . . , Ut) > C}. Let us denote a(U1, U2, . . . , Ut) by at for short. Then the stopping rule T may be described more extensively as an algorithm by the following. Algorithm 1 For a certain threshold, C ∈ R, and alarm function, a (with a convenient number of arguments), the stopping rule is: 1. Start: a1 = U1. 2. If a1 > C, then we stop immediately: Goto 10. else goto 3. 3. At time t ≥ 2: make a new observation Ut. 4. If at > C, then we stop: Goto 10. else increase t by 1 and goto 3. 5. Stop and return the stopping time: T = t. Choosing the alarm function a differently renders different kinds of stopping rules. Changing the threshold C will give the stopping rule different prop- erties. High values of C will make the stopping rule more conservative and less likely to stop, thus giving few false alarm, but also long expected delay of motivated alarm, while lower values gives a more sensitive stopping rule with shorter expected delay, but more likely to give false alarm. 4
  • 5. 4 Sufficient statistic 5 The change-point problem 6 Results 6.1 Stopping rules In this paper we consider the Shewhart, CUSUM, Shiryaev and Roberts methods. The Shewhart method (Shewhart [9]) is the stopping rule TW = min{t ≥ 1 : Ut > C}. The CUSUM method (Page [7]) is TC = min{t ≥ 1 : at > C} where at = 0 if t = 0 max(0, at−1) + n ln c − c−1 2c Ut if t = 1, 2, 3, . . . The Shiryaev method (Shiryaev [10]) is TY = min{t ≥ 1 : at > C} where at = 0 if t = 0 (1 + at−1) exp (n ln c − c−1 2c Ut) if t = 1, 2, 3, . . . 6.2 Evaluation In order to evaluate the stopping rules theoretically the threshold is chosen so that the ARL0 = E(T | T < θ) = 100 (i.e. the expected time until there is a false alarm is 100). Then aspects of motivated alarm are calculated and compared. All calculations here are based on simulations. The samplesize in the simulations of the ARL0 is 100 000 observations for each method and size of shift c = 1.1, c = 1.2 and c = 1.3. The first aspect of motivated alarm to be examined is the conditional expected delay, CED(t) = E(T − θ | T ≥ θ = t). This was simulated with a samplesize of 1 000 000 observations for each point in the plot of Figure 3. This performance measure indicates how quickly a method signals alarm after the change has happened. 5
  • 6. E(T −θ | T ≥θ=t) c = 1.1 c = 1.2 c = 1.3 Cusum Shiryaev Cusum Shiryaev Cusum Shiryaev 5 10 15 20 25 8 9 10 11 Time 5 10 15 20 25 3.0 3.2 3.4 3.6 3.8 Time 5 10 15 20 25 1.50 1.55 1.60 1.65 1.70 1.75 1.80 Time Figure 2: Conditional expected delay for the Cusum (red lines and circles) and Shiryaev (blue lines and squares) methods for shift size c = 1.1 (left plot), c = 1.2 (middle plot) and c = 1.3 (right plot). P(θ≤t | T =t) c = 1.1 c = 1.2 c = 1.3 Cusum Cusum 0 5 10 15 20 25 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Time 0 5 10 15 20 25 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Time 0 5 10 15 20 25 0.5 0.6 0.7 0.8 0.9 1.0 Time Figure 3: Predictive value for the Cusum (red, orange and brown lines and circles) and Shiryaev (blue, light blue and turqoise lines and squares) methods for shift size c = 1.1 (left plot), c = 1.2 (middle plot) and c = 1.3 (right plot). 6
  • 7. The second aspect of motivated alarm is predictive value, PV (t, ν) = P(θ ≤ t | T = t) where the change-point θ is assumed to be geometrically distrubuted with parameter ν with the interpretation µ = P(θ = t | θ > t−1). In each of the plots the sample size was 10 000 000 observations for each method. The predictive value shows how much credibility one should assert to a method in the case when the change-point occurs at time t. 6.3 Examples 7 Discussion References [1] D. Cox and D.V. Hinkley (1974) Theoretical Statistics, Chapman & Hall. [2] M. Fris´en (2003) Statistical surveillance. Optimality and methods. Int. Stat. Rev., 71 (2), 403–434. [3] M. Fris´en and P. Wessman (1999) Evaluations of likelihood ratio methods for surveillance: differences and robustness. Comm. Statist. Simulation Comput., 28 (3), 597–622. [4] E. J¨arpe (1999) Surveillance of the interaction parameter in the Ising model. Comm. Statist. Theory Methods, 28 (12), 3009–3027. [5] E. J¨arpe (2002) Surveillance, environmental. In: Encyclopedia of Envi- ronmetrics (eds. A.H. El-Shaarawi and W.W. Piegorsch), Wiley. 2150– 2153. [6] E. J¨arpe and P. Wessman (2000) Some power aspects of methods for detecting different shifts in the mean. Comm. Statist. Simulation Comput., 29 (2), 633–646. [7] E.S. Page (1954) Continuous inspection schemes. Biometrika, 41, 100– 115. [8] S.W. Roberts (1959) Control chart tests based on geometric moving averages. Technometrics, 42 (1), 239–250. [9] W.A. Shewhart (1931), Economic Control of Quality Control, Rein- hold Company, Princeton N.J. 7
  • 8. [10] A.N. Shiryaev (1963) On Optimum Methods in Quickest Detection Problems. Theory Probab. Appl., 8 (1), 22–46. [11] M.S. Srivastava and Y. Wu (1993), Comparison of some EWMA, CUSUM and Shiryaev-Roberts procedures for detecting a shift in the mean, Ann. Statist., 21 (2), 645–670. 8