This document summarizes research analyzing Twitter accounts infected with the Blackhole Exploit Kit (BEK) malware. The researchers collected over 165 TB of Twitter data from 2012 to analyze indicators of BEK-infected accounts. They identified accounts exhibiting automated, low-entropy messages containing BEK URL variations. Graph analysis revealed dense clusters of connected infected accounts spreading the malware. The researchers proposed metrics like entropy and correlation to identify compromised accounts, finding around 0.275% of accounts showed signs of BEK infection.
1. It's you on photo?
Automatic Detection of Twitter
Accounts Infected With the
Blackhole Exploit Kit
Josh White and Jeanna Matthews
Clarkson University
MALWARE 2013October 22-24, 2013
2. Objective
This work identifies some indicators of possible BEK
infectious messages on Twitter.
These indicators are used in the production of a filter
which can be applied to our collection system to
identify user accounts on Twitter which have reached
specific thresholds and can be considered
compromised or purposefully infectious.
MALWARE 2013October 22-24, 2013
4. Blackhole Exploit Kit (BEK)
Web-based application that manages the
installation and C2 of malware.
Utilizes a compromised server for malware
and web-page hosting.
Links luring victims to a compromised
server are distributed mainly through spam,
spear-phishing, and links in social network
posts.
MALWARE 2013October 22-24, 2013
5. Infection
The exploit server hosts innocuous looking
web-page
Page hosts a tool for scanning the visiting syste
Once a vulnerability is identified, it loads the
necessary exploit tools and compromises the visiting
system
A wide variety of malware may be loaded at this
point depending on the exact mission of the attacker.
MALWARE 2013October 22-24, 2013
6. Other BEK features
Contains modular
capabilities for new exploits
to be added rapidly and in
many languages.
Employs typical
countermeasures:
Packing, binary
obfuscation and antivirus
avoidance.
MALWARE 2013October 22-24, 2013
7. Proliferation of BEK
In 2012, BEK creators released version 2.0,
since it has become the most well
known/commonly deployed exploit kits
BEK enabled majority of malware infections
in 2012
One study found: BEK accounted for 29%
of all malicious URLS, in a dataset of 77,000
URLs marked harmful by the Google Safe
Browsing API
MALWARE 2013October 22-24, 2013
8. Data Collection Overview
Over the course of 2012 we collected 165 TB of
Twitter Data (Uncompressed)
175 Days Collected, 147 Full Days
Estimated 45 Billion Tweets
Recently released estimates place total Twitter traffic
at 175 million tweets per day in 2012
Our daily collection rates varied between 50% and
80% of total Twitter traffic.
We captured complete tweet data in JSON format
using Twitters REST API.
MALWARE 2013October 22-24, 2013
9. Key Examples of Attributes in
JSON format
profile link color/background color/title/default
image/image url (http and https)/text color/default
description, background image url (http and https), In
reply to screen name/status id and str/user id, follow
request sent, friends count, screen name, show all inline
media, utc offset, url, created at, favorite, retweet count,
favorites count, id translator, trunkated, contributors
enabled, contributors, time zone, verified, coordinates,
Geo, text, entities, id, id str, following, application,
retweeted, place, sidebar border color/fill color, followers
count, geo enabled listed count, notifications, name, lang,
location, protected, statuses count
MALWARE 2013October 22-24, 2013
10. Data Collection System
Distributed Data Collection Infrastructure
Geographically dissimilar IP's to simulate multiple
systems
Registered Application with Non-authenticated API
access (1 billion+ / week)
MALWARE 2013October 22-24, 2013
11. Data Storage
Collection in Streaming Gzip Python Dictionary Format (10:1
Average Compression Ratio); Storing 1.5 TB a Week
Converted to JSON on the fly when needed
Initially Stored in HDFS (Had Issues Scaling); Now Use DDFS
MALWARE 2013October 22-24, 2013
12. High Level Patterns
• Basic observable patterns
– Twitter has a lot of outages
– Posting rates follow predictable patterns
MALWARE 2013October 22-24, 2013
13. Analysis Framework
Filter Analysis (on live stream)
Experimental Analysis (after the fact)
MALWARE 2013October 22-24, 2013
15. Initial Identification
Searched for two well publicized strings being seen
in the wild: “It's you on Photo?” and “It's about you?”
MALWARE 2013October 22-24, 2013
16. Other message types found
Others found using REGEX, but not mentioned in
any articles or blogs at the time:
“You were nude at party) cool” photo)”
“Wow! Your photo is cool.”
“At party you was drunken) cool photo)”
“Your photo is amazing”
“It's photo of you?”
“It's all about you”
“It's about you?”
“Wow! You look good)”
Because BEK allows message customization the
permutations are virtually limitless
MALWARE 2013October 22-24, 2013
17. Results: Entropy
Normal Tweets =
4.6-7.5
197,237 manually
verified messages
from 100 sample
accounts
Normal
Infectious = 4.3 &
lower
Infectious
MALWARE 2013October 22-24, 2013
18. Results: Pearson’s Correlation
Coefficient
Two Infectious Accounts
Compared have an average PCC value of 0.927581013955
Positive value near 1
Indicates strong correlation “similarity” between accounts
One Infectious Account and One Non-Infectious
Account
Compared have an average PCC value of -0.0847935420003
Negative value near 0
Indicates strong negative correlation “difference” between
accounts
MALWARE 2013October 22-24, 2013
19. Results: Use of API/Application
Applications must be
registered for specific Twitter
API function usage
There hundreds of
registered applications:
ie.: “Iphone”, “Android”,
“Official Twitter Client”
“Mobile Web” is legacy
Requires no registration of
application using it
Requires no 0Auth
Less CPU/Memory utilization
MALWARE 2013October 22-24, 2013
20. Results: Graph Clustering
Visualize 729,609 suspicious
accounts
Utilized Gephi and the OpenOrd clustering
algorithm
Shows Obvious Clusters
Based on: Tx infectious directs message
to victim, victim is infectious when it starts
transmitting infectious messages
Non-connected accounts are assumed to
have clicked on an infectious message
without it being directed at them. We can
not currently trace what messages they
clicked on
Dense area's are the most successfully
spread infection chains.
The cores are considered Infection Hubs
MALWARE 2013October 22-24, 2013
21. Results: Some Summary Stats
Total Number of Tweets Processed:
Total Number of Unique Accounts Processed:
Total Number of Suspicious Accounts Found:
Total Number of Suspicious Tweets Found:
Calculated Percentage of BEK Infectious Accounts:
Calculated Percentage of BEK Infectious Tweets: 12.7%
MALWARE 2013October 22-24, 2013
6,531,319,202
265,163,290
729,609
8,286,480
0.275%
22. Related Work
A lot of research has been done into social network analysis using sites such as
Twitter. [21,22,24]
Including research that uses social network trends to track real world contagion
spread [25]
A few studies exist that examine BEK's malware dropping capability [2]
The URL identification method they used “w.php?f=(.*?)&e=(.*?)” does not
pick up all of the URL patterns that we witnessed
MALWARE 2013October 22-24, 2013
23. Related Work (continued)
Various works on determining if a message is automated or not exist, most
notably “Human, Bot or Cyborg” [26]
Unfortunately they relied heavily on Google Safe Browsing API which is only
updated after someone has verified the link is dangerous. [27]
One work showed that up to 16% of all Twitter accounts show signs of
automation. [28]
However, they point out that only a small number of tweets use the Mobile
Web API
MALWARE 2013October 22-24, 2013
24. Future Work (with edits from
Malware 2013 audience!)
Analysis of use of specific strings over time
Studying spread of ideas in Twitter in addition to spread of malware
Case study of top infectors
Carrier vs virology model of spread
Compare to all vs just benign
Testing twitter ( measure how well they do in disabling infected
accounts/help them get better); Work with Twitter to integrate
Include follower to following ratio on infected accounts
More like antispam than antimalware
Geographic analysis of the infected accounts
MALWARE 2013October 22-24, 2013
25. Conclusion
We completed a large-scale analysis of the characteristics of BEK infectious
Twitter Accounts
Some accounts showed signs of being solely for malware distribution
We found substantial variation in infectious message structure
We identified a large set of message types not previously published
We identified the characteristics most strongly associated with BEK infectious
messages
Tweets using the Mobile API, with a Text Entropy lower than 4.3, and
showing a strong PCC with known infectious messages, and those that
additionally have URL's embedded in them
We presented the integration of our measurement techniques and how they
integrate into our larger platform
Without manual investigation of all messages that we flagged as infectious we
can not be certain of our results
MALWARE 2013October 22-24, 2013
26. Citations
1. J. Oliver, S. Cheng, L. Manly, J. Zhu, R. Paz, S. Sioting, J. Leopando. “Blackhole Exploit Kit: A Spam Campaign, Not a Series
of Individual Spam Runs, An In-Depth Analysis,” Trend Micro Incorporated Research Paper, 2012
2. Chris Grier, Lucas Ballard, Juan Caballero, et al. 2012. Manufacturing compromise: the emergence of exploit-as-a-service. In
Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12). ACM, New York, NY, USA, 821832.
3. Gabor Szappanos. ”Inside The Blackhole,” SophosLabs, 2012
4. Jason Jones. ”The State of Web Exploit Kits,” HP DVLabs, 2012
5. Howard, Fraser. 2013. Technical paper: Journey inside the Blackhole exploit kit. Naked Security from Sophos. November 30
2012
6. Chris Grier, Lucas Ballard, Juan Caballero, et al. 2012. Manufacturing compromise: the emergence of exploit-as-a-service. In
Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12). ACM, New York, NY, USA, 821832
7. Fraser Howard. ”Exploring the Blackhole Exploit Kit,” Sophos Technical Paper, March 2012
8. Ziv Mador. ”Exploiting Kits: The Underground’s Weapon of Choice,” Infosecurity Europe 2012, SpiderLabs at Trustwave, 2012
9. Zhou Li, Kehuan Zhang, Yinglian Xie, Fang Yu, and XiaoFeng Wang. 2012. Knowing your enemy: understanding and detecting
malicious web advertising. In Proceedings of the 2012 ACM conference on Computer and communications security (CCS ’12).
ACM, New York, NY, USA, 674-686.
10. Shea Bennett. ”Just How Big Is twitter In 2012 [INFOGRAPHIC],” All Twitter - The Unofficial Twitter Resource, February 2013
MALWARE 2013October 22-24, 2013
27. Citations
11. Mike Melanson, Twitter Kills the API Whitelist: What it Means for Developers and Innovation, February 11 2011, URL
=http://www.readwriteweb.com/archives/
12. Joab Jackson, Twitter Now Using Oauth authentication for Third Party Apps, Computer World UK, September 1, 2010, URL=
http://www.computerworlduk.com/news/security/3237659/twitter-now-using-oauth- authentication-for-third-party-apps/
13. Arne Roomann-Kurrik, Announcing gzip Compression for Streaming API’s, Twitter Developers Feed, Jan 20, 2012, URL
=https://dev.twitter.com/blog/announcing-gzip-compression-streaming-apis
14. Prashanth Mundkur, Ville Tuulos, and Jared Flatow. 2011. Disco: a computing platform for large-scale data analytics. In
Proceedings of the 10th ACM SIGPLAN workshop on Erlang (Erlang 11). ACM, New York, NY, USA, 84-89.
15. C. E. Shannon. A Mathematical Theory of Communication, Reprinted with corrections from The Bell System Technical
Journal, Vol. 27, pp. 379-423, 623-656, July, October, 1948.
16. Graham Cluley. ”Outbreak: Blackhole malware attack spreading on Twitter using ”It’s you on photo? diguise,” Sophos Naked
Security Blog, July 27, 1012
17. Rob Waugh. ”It’s you! Blackhole vierus spreading rapidly via Twitter fools users with fake photo link,” MailOnline Science and
Tech New, July 2012
18. Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks.
International AAAI Conference on Weblogs and Social Media.
19. S. Martin, W. M. Brown, R. Klavans, and K. Boyack (to appear, 2011), OpenOrd: An Open-Source Toolbox for Large Graph
Layout, SPIE Conference on Visualization and Data Analysis (VDA).
20. Aditya Mogadala and Vasudeva Varma. 2012. Twitter user behavior understanding with mood transition prediction. In
Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media (DUBMMSM ’12).
ACM, New York, NY, USA, 31-34.
MALWARE 2013October 22-24, 2013
28. Citations
21.Johan Bollen and Huina Mao. 2011. Twitter Mood as a Stock Market Predictor. Computer 44, 10 (October 2011), 91-94.
DOI=10.1109/MC.2011.323 http://dx.doi.org/10.1109/MC.2011.323
22. Johan Bollen, Bruno Gonalves, Guangchen Ruan, and Huina Mao. 2011. Happiness is assortative in online social networks. Artif. Life 17, 3
(August 2011), 237-251.
23. Manuel Cebrian. 2012. Using friends as sensors to detect planetary-scale contagious outbreaks. In Proceedings of the 1st international
workshop on Multimodal crowd sensing (CrowdSens ’12). ACM, New York, NY, USA, 15-16.
24 Z. Chu, S. Gianvecchio, H. Wang, and S. Jajodia. 2010. Who is tweeting on Twitter: human, bot, or cyborg?. In Proceedings of the 26th
Annual Computer Security Applications Conference (ACSAC ’10). ACM, New York, NY, USA, 21-30.
25. Google. Google safe browsing API. http://code.google.com/apis/safebrowsing/, Accessed: Feb 5, 2010
26. Chao Michael Zhang and Vern Paxson. 2011. Detecting and analyzing automated activity on twitter. In Proceedings of the 12th international
conference on Passive and active measurement (PAM’11), Neil Spring and George F. Riley (Eds.). Springer-Verlag, Berlin, Heidelberg, 102111.
MALWARE 2013October 22-24, 2013
Notes de l'éditeur
{"22":"What are you setting out to do with your research described here today? Why is this significant?\n","17":"What are you setting out to do with your research described here today? Why is this significant?\n","23":"What are you setting out to do with your research described here today? Why is this significant?\n","18":"What are you setting out to do with your research described here today? Why is this significant?\n","19":"What are you setting out to do with your research described here today? Why is this significant?\n","25":"What are you setting out to do with your research described here today? Why is this significant?\n","20":"What are you setting out to do with your research described here today? Why is this significant?\n","21":"What are you setting out to do with your research described here today? Why is this significant?\n"}