This document describes a case study where an organization was experiencing W-2 fraud during tax season. To detect the fraudulent activity, the author consolidated log data from multiple sources into Splunk. They then calculated a risk score for each W-2 transaction based on factors like source IP country, IP usage uniqueness, and day of the week. This risk scoring approach identified suspicious activity without needing a specific fraud signature and helped resolve the organization's W-2 fraud issues.
2. IntroducKon
! Rob
Perdue,
VP
Professional
Services
at
8020
Labs
– Cyber
security
professional
for
12
years
– Specialize
in
Security
OperaKons,
DFIR
in
financial
sector
– Previously
held
posiKons
at
IBM,
ADP,
Viacom
and
ThreatGRID
– Splunking
since
2008
2
3. Agenda
! What
I
hope
you
will
learn
! Why
am
I
talking
about
fraud?
! Case
Study:
W-‐2
fraud
! Fraud
DetecKon
Framework
(FDF)
! CreaKng
Baselines
! Risk
Scoring
! Cyber
use
cases
for
FDF
! Key
takeaways
! Q
&
A
3
4. What
I
Hope
You
Will
Learn
! New
and
exciKng
ways
to
mine
your
data
! The
power
of
the
eval
command
to
score
risk
! The
usefulness
of
lookup
tables
for
baselining
– Inputlookup
– outputlookup
! Different
ways
to
detect
suspicious
acKviKes
4
5. Why
Am
I
Talking
About
Fraud?
! Contacted
to
assist
in
an
IR
invesKgaKon
! Turned
out
not
to
be
a
typical
IR
engagement
! Ever
hear
of
W-‐2
fraud?
I
hadn’t.
– Steal
a
W-‐2
and
file
taxes
before
the
real
person
does
5
6. Case
Study:
W-‐2
Fraud
! Tasked
with
finding
unauthorized
access
to
W-‐2’s
– During
tax
season
! Huge
amount
of
data
– Millions
of
rows
of
logs
! Relevant
logs
spread
across
several
database
tables
and
files
! Not
really
sure
what
W-‐2
fraud
looked
like
6
7. Case
Study:
W-‐2
Fraud
! How
the
data
was
distributed:
7
Summary
Tables
Main
DB
Stand-‐alone
Splunk
Several
CSV
Files
8. Case
Study
Con’t
! An
idea…consolidate
data
into
a
single
Splunk
instance
! No
signature
for
fraud,
no
problem
! Score
a
risk
value
for
each
W-‐2
transacKon
– Country
of
origin
– Uniqueness
of
Source
IP
– Day
of
Week
– History
of
IP
! All
of
that
resulted
in
one
ugly
search…
8
9. Case
Study
Con’t
! One
ugly
search…
9
index=w2
source="summarytable.csv"
webpage="*administrator*"
|eval
daymonth=date_month+date_mday
|eval
full_user=username+"@"+group|eval
full_user=lower(full_user)
|iplocaKon
src
|stats
values(Country)
AS
Country
values(Region)
AS
State
values(City)
AS
City
values(date_wday)
AS
Day
dc(daymonth)
AS
Unique_Days
count
as
user_ip_count
by
src,
full_user|join
full_user
[search
index=w2
source="
summarytableall.csv"
webpage="*administrator*"
|
eval
full_user=username+"@"+group
|
eval
full_user=lower(full_user)
|stats
count
as
total_W2_events
by
full_user]
|eval
traffic_per_IP=round((user_ip_count/total_W2_events)*100)|join
full_user
src[search
index=w2_history
|stats
values(days_seen)
AS
days_seen
values(total_count)
AS
hist_total_count
by
src,
full_user|fields
src,full_user,days_seen,
hist_total_count]
|eval
Risk_Score=0|eval
Risk_Score=if(traffic_per_ip<100
AND
days_seen<14,
Risk_Score+3,Risk_Score+0)|eval
Risk_Score=if(traffic_per_ip
==100
AND
days_seen<14,
Risk_Score+1,Risk_Score+0)|eval
Risk_Score=if(Day=="saturday"
OR
Day=="sunday",Risk_Score+1,
Risk_Score+0)|eval
Risk_Score=if(Unique_Days=="1",
Risk_Score+2,
Risk_Score+0)|eval
Risk_Score=if(total_W2_events=="1",
Risk_Score+2,
Risk_Score+0)|eval
Risk_Score=if(Country!="United
States",
Risk_Score+2,
Risk_Score+0)|eval
Risk_Score=if(days_seen>60,
Risk_Score-‐3,
Risk_Score+0)|eval
Risk_Score=if(traffic_per_ip
<100
AND
days_seen>13,
Risk_Score+1,Risk_Score+0)
|fields
full_user,
src,
Country,
State,
City,
Risk_Score
|sort
-‐Risk_Score
10. Let’s
Break
it
Down
10
index=w2
source="summarytable.csv"
webpage="*administrator*"
|eval
daymonth=date_month+date_mday
|eval
full_user=username+"@"+group
|eval
full_user=lower(full_user)
|iplocaKon
src
|stats
values(Country)
AS
Country
values(Region)
AS
State
values(City)
AS
City
values(date_wday)
AS
Day
dc(daymonth)
AS
Unique_Days
count
as
user_ip_count
by
src,
full_user
11. Let’s
Keep
Breaking
it
Down
11
|join
full_user
[search
index=w2
source="
summarytableall.csv"
webpage="*administrator*"
|
eval
full_user=username+"@"+group
|
eval
full_user=lower(full_user)
|stats
count
as
total_W2_events
by
full_user]
|eval
traffic_per_IP=round((user_ip_count/total_W2_events)*100)
Should
have
used
the
eventstats
funcKon…more
on
that
later.
12. …and
Down
12
|join
full_user
src[search
index=w2_history
|stats
values(days_seen)
AS
days_seen
values(total_count)
AS
hist_total_count
by
src,
full_user|fields
src,full_user,days_seen,
hist_total_count]
13. …and
Down
13
|eval
Risk_Score=0
|eval
Risk_Score=if(traffic_per_ip<100
AND
days_seen<14,
Risk_Score+3,Risk_Score+0)
|eval
Risk_Score=if(traffic_per_ip
==100
AND
days_seen<14,
Risk_Score+1,Risk_Score+0)
|eval
Risk_Score=if(Day=="saturday"
OR
Day=="sunday",Risk_Score+1,
Risk_Score+0)
|eval
Risk_Score=if(Unique_Days=="1",
Risk_Score+2,
Risk_Score+0)
|eval
Risk_Score=if(total_W2_events=="1",
Risk_Score+2,
Risk_Score+0)
|eval
Risk_Score=if(Country!="United
States",
Risk_Score+2,
Risk_Score+0)
|eval
Risk_Score=if(days_seen>60,
Risk_Score-‐3,
Risk_Score+0)
|eval
Risk_Score=if(traffic_per_ip
<100
AND
days_seen>13,
Risk_Score+1,Risk_Score+0)
And
finally…
|fields
full_user,
src,
Country,
State,
City,
Risk_Score
|sort
-‐Risk_Score
14. Where’s
the
Magic?
14
! CreaKon
of
a
composite
event
– Join
– Stats
! Use
of
eval
to
score
the
event
– |eval
Risk_Score=if(traffic_per_ip
==100
AND
days_seen<14,
Risk_Score+1,Risk_Score+0)
! Know
the
data
– What
did
the
URL
for
W-‐2
access
look
like?
– What
could
I
extract
from
the
logs
to
build
a
profile?
15. Closing
the
Case
Study
! It
worked,
but…
! ReacKve
in
nature
! Not
terribly
efficient
! Risk
scoring
could
be
be{er
! Spawned
the
Fraud
DetecKon
Framework
(FDF)
15
16. Fraud
DetecKon
Framework
! UKlize
everything
you
can
from
a
single
log
event
– Timestamp
– Time
of
Day
– User
Agent
String
– URL
– IP
Info
– User
Name
! Enrich
the
log
– Even{ypes
– GeoIP
– IP
History
– User
History
– Watch
lists
– Tags
! ConKnuous
Baselining
! Risk
Scoring
16
17. What’s
in
a
Log?
17
2002-‐05-‐02
17:42:15
172.22.255.255
-‐
172.30.255.255
80
GET
/images/picture.jpg
robper
200
Mozilla/4.0+(compaKble;MSIE+5.5;+Windows+2000+Server)
Day
of
Week
Time
of
Day
Source
IP
Method
URI
Stem
User
Agent
Server
IP
User
Name
18. Enriching
Your
Logs
! EventTypes/Tags
– What
kind
of
transacKon
was
this?
! GeoIP
(iplocaKon)
– Where
is
this
IP
coming
from?
! IP
History
– Have
I
ever
seen
this
IP
before?
! User
History
– When’s
the
last
Kme
I’ve
seen
this
ID
before?
– Is
this
an
inacKve
account?
! User
Agent
String
– Is
this
UAS
unusual?
– Have
I
seen
it
before
from
this
user?
– Is
there
a
non-‐English
language
preference?
! Watch
lists
– Is
this
IP
on
any
threat
or
fraud
watchlists?
18
19. Building
Event
Types
! No
need
to
score
a
GET
request
to
a
jpg
file
! Fully
understand
the
applicaKon
you
are
scoring
– App
Dev
guys
are
our
friends
– Don’t
assume
you
now
what
a
parKcular
URL
is,
or
isn’t,
for
! Build
even{ypes
for
transacKons
of
interest
– W-‐2
reports
– Payroll
ExecuKon
– Beneficiary
Change
– Direct
Deposit
Change
– Successful
Logons
19
20. Baselining
! What
does
this
usually
look
like?
! Enables
risk
scoring
! Relies
heavily
on
lookup
tables
! Lesser
known
lookup
commands
– Inputlookup
– Outputlookup
20
21. FDF:
Baselines
! GeoIP
– Where
does
this
client
usually
log
in
from?
! User
Profiles
– User
Agent
String
– IP
Info
– User
Logon
History
21
22. FDF:
GeoIP
! Determine
primary
locaKon
of
client
! Feeds
into
Haversine
formula
– h{ps://apps.splunk.com/app/936/
! Scheduled
search
! UKlizes
inputlookup
and
outputlookup
22
23. FDF:
GeoIP
23
index=hrapp|iplocaKon
allfields=true
src|eval
clientlat=lat|eval
clientlon=lon|
stats
min(_Kme)
AS
firstTime
max(_Kme)
AS
lastTime
count
by
client,Region,Timezone,clientlat,clientlon
|
eventstats
sum(count)
as
client_total
by
client|
inputlookup
append=T
client_geoProfiles.csv|
eventstats
sum(client_total)
AS
client_total
by
client,Region,Timezone,clientlat,clientlon|stats
min(firstTime)
AS
firstTime
max(lastTime)
AS
lastTime
sum(count)
AS
count
by
client_total,
client,Region,Timezone,clientlat,clientlon|eval
percent=round((count/client_total)*100)|
outputlookup
client_geoProfiles.csv|where
percent>75|outputlookup
client_geoBase.csv
! GeoIP
Baseline
Search:
24. Let’s
Break
it
Down
24
index=hrapp|iplocaKon
allfields=true
src
|eval
clientlat=lat|eval
clientlon=lon
|
stats
min(_Kme)
AS
firstTime
max(_Kme)
AS
lastTime
count
by
client,Region,Timezone,clientlat,clientlon
|eventstats
sum(count)
as
client_total
by
client
|
inputlookup
append=T
client_geoProfiles.csv
|eventstats
sum(client_total)
AS
client_total
by
client,Region,Timezone,clientlat,clientlon
|stats
min(firstTime)
AS
firstTime
max(lastTime)
AS
lastTime
sum(count)
AS
count
by
client_total,
client,Region,Timezone,clientlat,clientlon
|eval
percent=round((count/client_total)*100)
|outputlookup
client_geoProfiles.csv
|where
percent>75
|outputlookup
client_geoBase.csv
How
this
data
is
used
is
shown
on
slide
32
26. FDF:
User
Baseline
! Create
profiles
for
each
users
– First/Last
Time
– User
Agent
String
– IP
Address
! Scheduled
search
! UKlizes
inputlookup
and
outputlookup
26
27. FDF:
User
Baseline
27
index=hrapp|
fillnull
value=unknown
tag::src
|
stats
min(_Kme)
AS
firstTime
max(_Kme)
AS
lastTime
first(date_wday)
AS
weekday
by
user,client,src,user_agent,tag::src,
tag
|inputlookup
append=T
user_Profiles.csv
|
stats
min(firstTime)
AS
firstTime
max(lastTime)
AS
lastTime
values(weekday)
AS
weekday
by
user,client,src,user_agent,tag::src,tag
|
outputlookup
user_Profiles.csv
! User
baseline
search:
28. Breaking
it
Down
28
index=hrapp|
fillnull
value=unknown
tag::src
|
stats
min(_Kme)
AS
firstTime
max(_Kme)
AS
lastTime
first(date_wday)
AS
weekday
by
user,client,src,user_agent,tag::src,
tag
|inputlookup
append=T
user_Profiles.csv
|
stats
min(firstTime)
AS
firstTime
max(lastTime)
AS
lastTime
values(weekday)
AS
weekday
by
user,client,src,user_agent,tag::src,tag
|
outputlookup
user_Profiles.csv
How
this
data
is
used
is
shown
on
slide
32
35. FDF:
Scoring
Review
! In
its
current
state:
– EssenKally
scores
the
risk
of
the
session
– Can
focus
score
on
parKcular
event
types
(e.g.,
direct
deposit,
payroll)
– Does
not
score
behavior
while
in
the
app
– Good
job
of
detecKng
compromised
creds
! Can
easily
be
modified
to…
– Detect
transacKon
anomalies
(e.g.,
wire
transfers,
payroll
fraud)
– Incorporate
Bremford’s
law
ê h{p://apps.splunk.com/app/355/
– Score
other
risks
35
36. FDF:
Other
Cyber
Use
Cases
! Compromised
creds
– FTP
– OWA
– VPN
– Custom
apps
! User
profiles
– Proxy
logs
– Logon
Kmes
! Risk
scoring
– IPS
Alert
+
AV
Hit
+
Failed
Logon
+
?
36
37. FDF:
Side
Story
! One
compromised
FTP
account
reported
– The
client
wanted
to
know
how
many
other
accounts
were
used
for
unauthorized
access
– ~600
acKve
FTP
accounts
! Fortunately
the
client
had
a
year’s
worth
of
FTP
logs
in
Splunk
! UKlized
the
FDF
framework
to
detect
14
addiKonal
accounts
37
38. Key
Takeaways
! Baseline
your
data
! Inputlookup
and
outputlookup
very
powerful
baselining
tools
! Chaining
eval
statements
is
an
effecKve
way
of
scoring
risk
! Use
every
bit
of
informaKon
found
in
an
individual
log
! Enrich
what
you
can
38
40. 40
Security
office
hours:
11:00
AM
–
2:00
PM
@Room
103
Everyday
Geek
out,
share
ideas
with
Enterprise
Security
developers
Red
Team
/
Blue
Team
-‐
Challenge
your
skills
and
learn
new
tricks
Mon-‐Wed:
3:00
PM
–
6:00
PM
@Splunk
Community
Lounge
Thurs:
11:00
AM
–
2:00
PM
Learn,
share
and
hack
Birds
of
a
feather-‐
Collaborate
and
brainstorm
with
security
ninjas
Thurs:
12:00
PM
–
1:00
PM
@Meal
Room