1) The document describes a study where researchers developed a chatbot named Aubrey to autonomously chat with online criminals involved in e-commerce fraud.
2) Aubrey was able to successfully chat with 470 criminals and gather intelligence on their activities, roles, and relationships. It discovered many fraudulent operations, accounts, and sites that were hidden in private conversations.
3) The intelligence collected provided a better understanding of the criminal ecosystem and relationships between upstream resource sellers and downstream fraud operators, helping to improve fraud detection.
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
Extended summary of "Into the Deep Web: Understanding E-commerce Fraud from Autonomous Chat with Cybercriminals"
1. UNIVERSITÀ DEGLI STUDI DI TRIESTE
Dipartimento di Ingegneria e Architettura
Ingegneria Elettronica e Informatica
Anno accademico 2019/2020
Extended Summary of “Into the Deep
Web: Understanding
E-commerce Fraud from Autonomous
Chat with Cybercriminals”
Studentessa
Federica Azzalini
Relatore
Professor Alberto Bartoli
2. Extended Summary of “Into the Deep Web: Understanding
E-commerce Fraud from Autonomous Chat with
Cybercriminals”
This article was written by Peng Wang, Xiaojing Liao, Yue Qin and XiaoFeng Wang
from Indiana University Bloomington and presented during Network and Distributed
Systems Security (NDSS) Symposium 2020 conference that took place from 23 to 26
February 2020 in San Diego, CA, USA. It focuses on the understanding of e-
commerce fraud and analyzes the industrialized market around it.
E-commerce fraud consists in several illicit practices that aim lucre by finding profitable
opportunities hidden behind online retailers’ policies or by manipulating the market
around them. The authors found that e-commerce miscreants are mainly divided in
two types of roles: upstream and downstream roles. The first ones are the resource
sellers, they sell attack assets that are necessary for illicit businesses. They are
divided in account merchants, which sell fraud accounts, and SIM farmers that sell
fake SIM cards.
The downstream actors are the fraud order operators, they work in the downstream
market that supplies affiliate networks and platforms and they hire small time workers
to complete tasks on their behalf.
Upstream and downstream actors work in the surface web as well as in the deep web
and use Instant Messaging social networks (e.g. Chinese Tencent QQ, Telegram etc.)
to communicate.
The authors’ challenge was that the majority of valuable threat intelligence is shared
in one-on-one conversation with cybercriminals and therefor is very hard to obtain. In
this perspective, the team developed the first chatbot, Aubrey, that is able to
autonomously drive a human-robot conversation with miscreants where it does not
just passively receive information by infiltrating IM groups but actively participates
asking questions to collect relevant data.
Aubrey was built to specifically gather information about the three most profitable e-
commerce fraudulent activities which are account trading, rackets such as order
scalping or bonus hunting and SIM farming. Account trading is the practice of account
merchants to sell fake accounts, their market is massive and they work with both
upstream and downstream actors.
Order scalping is a procedure where small time workers are hired by merchants to
keep purchasing great volumes of their products (without actually paying for them) to
inflate their sales. Bonus hunting is a practice where bonus hunters hire workers to
buy products that benefit of bonuses or discounts in quantities that exceed the limit
per person given by the company’s policy, and then resell them at their original prices.
The third one is SIM farming. Every account requires SMS verification, which means
for every fake account one purchases, he will also need a fake phone number that
actually receives the SMS, so SIM farmers profit from fake accounts by providing SMS
verification code using SIM gateways.
All three parties are driven by the money-making opportunities that lie in this criminal
market. Through Aubrey the team evaluated that SIM farmer’s revenue is around
$8,900 per month (prices used to calculate this number are based on miscreants
3. reports and therefor are not fully reliable), account merchants earn approximately
$48,000 per month, while bonus hunters get around $16.700 per month.
The first amounts were calculated based on the average prices of resources and their
estimated sales volumes, while bonus hunters earn the value of the bonus multiplied
by the number of orders they get.
To better understand how Aubrey operates, the authors exemplify its architecture in
three major building blocks: Target Finder – Strategy Generator and Dialog Manager,
as illustrated in the following figure:
The Target Finder’s task is to classify criminals from the IM groups in their roles so it
can then start the correct conversation for that specific role. It operates by two binary
classifiers, one to distinguish upstream from downstream actors, the other to divide
upstream actors into SIM farmers and account merchants. This is possible since
messages that belong to different roles carry specific keywords. To find these
keywords, the authors calculate the log odds ratio to measure the correlation between
the occurrence of a word and the corresponding role by comparing the frequency of
the word in both the role’s messages and all the other roles’ messages. They then
compute the z-score and variance to understand how unique the word is for that role
so that the classifier can utilize it, along with other more specific features, to determine
the actor’s role. The second classifier looks for indicators in the form of verb-and-noun
phrases that distinguish selling from purchasing behaviors since account merchants
buy SIM cards from SIM farmers.
The Strategy Generator semi-automatically generates a Finite State Machine (FSM)
for the role. To construct it, they utilized seeds, 20 conversation samples between
Company A’s (Anonymized Chinese e-commerce retailer) analysts and criminals with
an average of 40 messages each. They segmented them into 800 traces (sequences
of messages) and then broken them into 200 dialog blocks (single consecutive
messages). Every block is identified by a keyword based on the topic and multiple
blocks are clustered by topic. The topic, the question from the block and other domain
specific questions collected from other sources (150 IM groups and two forums), called
extended set, form an FSM state. The extended set is generated with the same
procedure, but the question is answered by the subsequent message in the same
block to form question/answer pairs called dialog pairs. IM groups (mostly QQ)
generated 50,000 dialog pairs from 1M traces, while underground forums generated
700,000 dialog pairs from 135,000 traces. By connecting the dialog pairs and the
states through keywords the questions Aubrey can ask are greatly enriched and this
4. also covers the eventuality that Aubrey is asked a question, so it can look for similar
ones and find a relevant answer.
The Dialog Manager gets the target and its role by the Target Finder and executes the
FSM guiding the transitions through states by analyzing and understanding the
criminal’s response. Through Natural Language Processing techniques, the Dialog
Manager determines whether the answer is negative, interrogative, carrying target
intelligence or not, with a precision of 98.6% for negations and 97.8% for
interrogations. In the end, human analysts will further check the gathered intelligence
and look for missed information.
Aubrey was run for 545 chat attempts and chatted for an average of four minutes per
target before completing the task. This led to successfully chat with 470 e-commerce
miscreants divided into upstream (315 resource sellers, split by 185 selling SIM cards
and 130 selling fake accounts) and downstream (155 fraud order operators).
Their results made it possible to gain a better understanding on the criminal use of
one-on-one conversation and on the related hidden criminal infrastructure and attack
assets. The table below shows the three categories of miscreants and the
corresponding threat intelligence collected:
The obtained and extended intelligence are very effective in enriching their knowledge
of the fraud ecosystem, starting with the discovery that 93% of the links collected do
not have any search results on surface web indicating that the fraud supply chain
mostly operates in the Deep Web. They also found that most of the time cybercriminals
only disclose their assets in one-on-one interactions. These private conversations let
on much more SIM farms and fraud account sites that the ones from the groups chat
logs, allowing them to discover Company A’s security exploitation and new software
previously unknown. Furthermore, the intelligence collected brought to light the
complicity among the three criminal roles, as shown below. The nodes of the graph
are the criminal roles, the directed edges indicate when a criminal refers to another
role in the chat and the weight of the edge is how many times that happens.
5. This study was made on Chinese retailer Companies, but it was found that Chinese
e-commerce fraudsters are also involved with non-Chinese Companies like Amazon
or eBay.
One quality of Aubrey’s is that it can be taught to also work on non-Chinese data or
even other criminal domains by re-training the software, but it also has limitations. One
is that even if it is an autonomous tool it needs human aid to guarantee the quality of
data and another is that its understanding of information relies on the comprehension
of keywords, so it would be rather easy for miscreants to start using jargons once the
chatbot is detected.
In conclusion, the authors, with mindful respect for legal and ethical boundaries
involved with talking and trading with criminals, designed and implemented the first
chatbot to actively collect data related to the e-commerce fraud ecosystem from one-
on-one conversation with miscreants which proved to be by far more valuable than
information on IM groups with a comparison of 318 useful messages per thousand to
2 useful messages per thousand respectively. The intelligence it gathered had a great
impact and helped Company A develop a system to detect dishonest accounts and
orders. It helped expose a great number of yet unknown fraudulent artifacts and
improve our understanding of the relationship between upstream and downstream
actors and how they work in the Deep Web.