Machine Learning: The Future of Fraud Fighting

THE FUTURE OF FRAUD FIGHTING
an intro to machine learning

table of contents
what is machinelearning?
why do we want machine learning?
what does machine learning do?
where does machine learningwork?
how accurate is machinelearning?
how does machine learning apply tofraud?
does ithave anyweaknesses?
fraudfighting in the future
references and furtherreading
introduction00
01
02
03
04
05
06
07
08
09
03
05
06
08
10
12
16
18
19
02

introduction
Machine learning
—abuzzwordtobe sure. Inthisdigital age, do you really
know what machinelearningis and how itcan beused?
Chances are good thatmachinelearningisalready
integraltoyour day-to-day life.Crazy,right?
The goal ofthisebook is tohelp you understand
machinelearningand itsapplicationinreducing fraud
on your website. With justa few keystrokes and the
righttools,you can stop fraudstersintheir tracks.
2
Fraud is a $200 billiondollar,trulyglobal industry. Cyber
criminals steal frommorethanone million people...every
day! Ifyou’re an individual hitby fraud, hopefullyyour bank
orinsurance companycan help you out.However,
businesses —especially small business owners —findthat
every dollar lost compoundstoa staggeringdegree.
AtSift Science, we thinkofourselves as Fraud Fighters,
wielding ourtrustysword ofmachine learningagainstthe
crooks thatwant torobyour site. Follow us as we journey
throughthe intricacies ofthe interwebstonotonlygraspthe
what ofmachine learningbutalso how toputittowork for
you fast.

01 what is machine learning?
So what exactly is machine learning?
Machine Learning =Statistics +Data +Software.
Machine learning(ML) falls underthe umbrellaof
computerscience.Atitscore, machinelearning refers to
the practice oftraining computersvia software to
recognize patternsand inferpredictions, emulatinga
human-likeability tolearn from “experience”. InML,the
“experience” thatmachines get is fromhumansnotonly
inputtingbutalso indicatinguseful data. But why doesn’t
data alone make formachinelearning?
3

01 what is machine learning?
Let’sthinkabout data itself.Chances are, your computer
has spreadsheets fullofnumbersand namesthat
encompass your pastsales and communications.
However, you cannot predictyour site’sfutureunless you
take your data tothe next level. Machine learningis that
nextstep.
With ML,computers —via specially created algorithmsand
mathematicalformulas—can learn fromhistorical data and
suggest likely future scenarios.The magic ofmachine
learningis thatafter the initial set-up,your machinelearning
system needn’tfurtherexplicit programmingoutside of
occasional training and course-correcting.
4

02 why do we want machinelearning?
What’s so great about machine learning?
IfI toldyou toimagine a tiger,your braincanconjure upan
image,right? What are some ofthe key elements that
connote tiger inyour brain? Big cat, stripes, tail —all of
these are relevantdescriptors.
But how does your brain distinguish between a tiger and a
lion (they are both big cats), or between a tiger and a
zebra(stripes forthewin!).
Yourability tonotonly learn patternsindata but
distinguish between and weigh the importanceof each
patternis essential toyour intelligence.This ability is
what engineers strive to emulate when building
machine learning systems.
ZEBRA
Machine learningallows fora reductioninhuman eﬀort.
The timeittakes humanstoread, synthesize, categorize,
and evaluate data is significant —and that’sall before any
action takes place. MLteaches machinestoidentify and
gauge the importance of patternsinplace ofhumans.
Particularly foruse- cases where data mustbe analyzed
and acted upon ina shortamountoftime,having the
support of machinesallows humanstobe moreeﬃcient
andact with moreconfidence.
TIGER
5

03 what does machine learning do?
Machine learning oﬀers automated,
accurate predictions.
Depending on the purpose ofa specific machine
learningsystemoralgorithm(called models),ML
can turndense and confusing informationintoa
narrative thatsuggests the likelihoodofa future
action. By continually addingdata and experience,
a user further trainsthe MLsystem,makingits
predictions moreaccurate and morerelevant tothat
specific use-case.Atits core, machinelearning is a
3-partcycle.
6

03 what does machine learning do?
Train
The firststep of ofany MLsystem is totrain the model. Herein, the algorithm receives its
basic purpose: toextract patternsfromdata.Subsequent training can come fromthe user
and doesn’t require furthertechnical guidance.
Predict
Once the algorithmhas suﬃcientdata, itcan make predictions.These predictions
essentially answer the question, “what matchesthe datapattern?”.
Act
MLis only as accurate as itsdata. Inorder togrow even moreaccurate, the model needs
feedback, which is the Act phase. Inthisstep, the user indicates whether the predictionis
correct orincorrect,which inturntrains the model,restarting the cycle.
7

04 what are examples of machinelearning?
How does machine learning aﬀect my
day-to-day life?
Excellent question!Do you use Gmail? Howabout
Netflix? Guess what -- bothofthose sites use machine
learningtooptimizeyour experience.
Netflix is a video-streaming site thatoﬀersviewers
access toTV shows, documentaries,movies, etc.in
exchange fora subscription fee. Netflix constantly asks
users toreview the shows and movies that they’ve
alreadywatched.
The purpose ofthose reviews (via a star-rating system) is
totrainthe Netflix recommendation model, based on what
you choose towatch and how toreact tothose choices.
Netflix wants tounderstandyour taste and preferences.
The Netflix’sSuggested for Youis machinelearning,
automatically gatheringdata totailorthe experience toyou.
8

Why does Netflix want to create such a
you-specific experience?
Netflix “estimates that75% ofviewer activity isdriven by
recommendations.”Those recommendations take into
account not justthe viewer’s historical taste but also the
viewer’s ongoing viewing trends,changing as viewer
selections change.Sometimes,we can confuse the
algorithms,like when a friendpicksthe movie formovie
night.A diﬀerent user mayaccount forsome unexpected
futurerecommendations, but that’show you know thatthe
MLsystemisworking.
04 where does machine learning work?
ENGINEERS TRAIN
a “tastes” model
NETFLIXMAKES
movie recommendations
VIEWERS
select &rate movies
Retrainyour Netflix model by selecting the Not
Interested optionforsuggestions thatare out-of-line with
your tastes.There you go —you’re using machine
learning!
9

05 how accurate is machine learning?
Is machine learning actually accurate?
Yes, very! For manymachinelearningmodels,the data is
always incoming and users are constantly actingon the
predictions.Takeforexample your email.Inyour email
inbox,how oftendo you currently see spammessages?
Maybe once every few weeks? Andhow rarely do good
messages get stuck inyour spamfolder?
On the whole, your email account is pretty accurate in
weeding outspamorfake emails,and you haven’t done a
thingtoteach itwrong fromright.That’s because it’s
predictingspamwith machinelearning.
Ifyou didn’thave machinelearningworkingtokeep your
inboxclear,you’d see lotsmoreViagra oﬀers, desperate
notes fromdeposed princes,and “You’ve won!” emails
daily. The 2013Symantec Intelligence Reportestimated
that spammade up68% ofallemail traﬃcin2012.Thatthe
MLalgorithms can catch the vast majorityofthese spam
messages is pretty impressive!
10

05 how accurate is machine learning?
How do these algorithms work?
Youremail MLsystemis always looking forpatterns in
data. Certain keywords appear inspam messages more
frequentlythangood messages.Termslike “Viagra” or
“belly fatloss,”indicators such as the message length or
email domain,and data fromthe sender’s email address
itself maysuggestspam.
Every timeyou click “Report Spam,”you’re actingto train
your email’s MLsystemtorecognizethe
ever-changing signs ofjunkmail.
When you finda good message stuck inyour Spam folder
and markas “Not Spam,”you’re doing the same. So even
when it’srightoutofthe box,machine learningcan be
exceptionallyaccurate. Likely, you receive all ofthe emails
thatyou want and don’toften see those spammessages --
that’smachinelearning atwork!
11

06 how does machine learning apply to fraud?
What can machine learning do against fraud?
As evidenced by the email spamexample,badusers
leave behindsigns.These calling cards offraudulent
activity can be used totrain a machinelearning model to
spot credit card fraud,fake accounts, referral fraud,and
othertypes ofillicit activity that plague websites. Having a
humanreview every new order ornewly created account
—knownas manual review —is too timeconsuming as
traﬃcand transactionalvolumeincrease.
The solution?Put a systeminplace thatidentifiesthe most
glaring examples offraud,while flagging those thatrequire
further,closerexamination.
Althoughcreating a simpleset ofbusiness rules
based on commonfraudulentcharacteristicsmay help
you at first,such a solutionisn’tscalable.
Why, you ask? Well, as you learn and identify fraud
signals, you have tomanuallycreate an If X then Y
system.Adding a new ruleforevery IP address or email
domain where fraudhas occurred isunscalable and
unreasonable.
Imaginethatyour onlinestore is hitby a fraudster from
Venezuela. Ifyou were usingan If X then Y, rules-based
system,you mightset upa businessrule todeny any and
all orders comingfromVenezuela.
But what ifonly 15%ofall Venezuelan orders are
fraudulent—you’dbe missingouton the profitsof that
other85% oforders,and creating a negative
customerexperience forthose goodshoppers.
12

Additionally,what ifthose manyindicators —shipping
address, IP address, social network data, shopper credit
card issuer —carry diﬀerent weights? Let’sreturntoour
animal example tobetter visualize the impact ofstaticIf X
then Y systemson detecting onlinefraud.
IfI tell you, “Animal X is approaching;ithas stripes,” do
you automatically assumethatAnimal X is atiger and
therefore terrifying?Of course not!Itcould be any number
ofstripedanimals—a zebraora clown fishoratapir.
There are countless otherand moreimportantdata points
thatwould help you decide whether you shouldrunaway
orstick around tofinallymeet a tapir.Even ifa non-ML
systemhad data pointsthat included,“Animal X has
stripes,is orange and black, and can be foundintrees,”
thismystery animalcould still be a monarchbutterflyora
tree frog.But with suﬃciently weighted data (Animal X
has a long tail,a loud roar, is a big cat), your brainwould
be able to deduce that,“Hey, I thinkthat’sa tiger.Time to
get outtahere!”
ZEBRA CLOWNFISH TAPIR
13

The ability totake indata and understandthe relative
values ofeach data point inrelationtothe whole is where
intellect comes into play. That is what machinelearning
can do forfrauddetection. With a trainedMLsystem,
analysts can accurately assess which orders are tigers
and which aretapirs.
Machine learningis a voice ofreason infraud detection.
When combinedwith humanintuition, statisticalinsights
can bring about betterdecisions and long termfinancial
savings.
Rather thancanceling orders fromgood customers
simplybecause they mayshare an insignificant attribute
with a fraudulentorcharged back transaction,machine
learningcan weigh the many otherindicators that
contributetoa customer’s profile.
The ability to take in data and understand the
relative values of each data point in relation to the
14
whole is where intellect comes into play.
”
“

Fraud is rarely cut and dry.ML gives users probabilities,
notblackand white judgements. Probabilities allow you to
understandthe nuances in your data, enabling you to
make better decisions and teach the ML model tomake
even moreaccurate predictionsinthefuture.
describes
15
Youneed human judgement to know if the person
you’re calling is a fraudster.
”
“
replicates
can tell what’s said
explains
creates
can tell what’s meant

Machine learning is only as strong as its data. Ifan
onlinebusiness owner is only predictingbased on her
own, limited pool ofshoppers,she must experience each
unique fraudattempt before being able topredict future
attacks.This methodis exceptionallyunsustainable.
With machinelearningoperatingon a global poolof data,
however, a shopowner inPortugal canbenefit fromthe
fraudexperience ofa shopowner in Portland or Odessa
orKyoto.
The vastness ofdata froma wider net ofexperience
allows predictive behavior without actual losses.That way,
shopowners thathave yet toexperience fraud froma
specific IP address or email address can benefit fromthe
knowledge ofthose whohave.
Perhaps you’ve heard ofthe networkeffect? Inorder for
machinelearningtoeffectively and accurately detect fraud
before ithappens,a criticalmassofdata and users must
be achieved. Unless you’re an Amazon.com,operating
with millionsofworldwide data points,usingmachine
learningwithout anyinput fromoutside ofyour business is
likely ineffectual.
Machine learningneeds a deep set ofdatainorder to
be trulymagical.
07 does it have any weaknesses?
16

Once connected toa vast network oforderand
transactionhistory,MLforfraudis unstoppable.
Certain fraudfightingtools can even trainfraud detection
systemsinreal time.What does thismean? Say a shop
owner inShanghai sees achargeback on an order placed
by Freddie Fraudster inHong Kong. Once that
Shanghainese shopowner marksFreddie as “bad” inthe
global MLfrauddetection system, immediately and
automatically Freddie’s details will be updated forevery
othershopowner on the networkworldwide. So when
Freddie tries tobuy somethingfroma Londoner’swebsite
2 minutes later,the Londonerwill see that Freddie is a bad
user and can cancel Freddie’s order before she too
suﬀers achargeback.
07 does it have any weaknesses?
17

Criminals are constantlycookingupnew ways torob hard-
working folks.Whether it’sdiluting good content with spam
messages, stealing credit card information,orscamminga
referral or coupon system,fraudis an ever-evolving, ever-
growing industry.
Thankfully,machinelearningenables the good guys to
stay one step ahead ofthe fraudsters.When properly
leveraged, ML bringstogether sets of informationso vast
and so deep thatthey would be otherwise
incomprehensible.Allowing manual reviewers tofocuson
only flagged data pointssaves timeand money.By
combining the power ofpattern recognitionwith human
intuition,machinelearningis drivingthe evolutionofhow
we processdata.
Machine learningis a necessary tool inthe arsenalof
today’s savvy business leaders. Whether used to stop
criminals before they hityour bottomline, project market
trends,orflag spam emails,ML is how we optimizeour
lives.
Interested inlearningmoreabout the power of machine
learning? Have a fraudproblemthatneeds an accurate
and innovativesolution?
We’re here to answer any questions—just drop us a
line at scientists@siftscience.com or visit our
website at https://siftscience.com.
08 fraud fighting in the future
18

Medium offersan excellent overview andexercises in
introductorymachinelearning.
Symantec’s regularintelligence reportsoffer
invaluable insightinto the impact offraud.
Wired Magazine’s article on the Netflixsuggestion
algorithms is an interestingread.
Sift Science offersa "Machine LearningDemystified"
overview onYouTube.
Business Insider’s machinelearningoverviewwith
embedded Nova video arehelpful.
References and further reading
19

Machine Learning: The Future of Fraud Fighting

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (9)

Dernier

Dernier (20)

Machine Learning: The Future of Fraud Fighting