Application of Data Mining in
Security: Trends and Research
Directions
Ja’far Alqatawna
University of Jordan
J.Alqatawna@ju.edu.jo
Presentation at University of Granada
CITIC-UGR
About me
Ja’far Alqatawna
– Education:
• PhD in E-Business Security, SHU, UK.
• MSc. in Information & communication Systems Security, The Royal Institute of
Technology (KTH), Sweden.
• BEng. In Computer Engineering, Mu’tah, Jordan.
– Work experience
• Associate Professor at KASIT and head of BIT department at University of Jordan.
• Program coordinator: MSc. In Web Intelligence.
• Worked as Assistant Technical Director, Computer Center, University of Jordan(UJ).
• Worked for the Swedish Institute of computer Science at the Security Policy and Trust
Lab. Sweden.
• Co-Founder of Jordan Information Security & Digital Forensics Reacher Group.
• Member of IEEE.
– Contact: J.Alqatawna@ju.edu.jo
Teaching Experience
• BSc. Level:
– e-Business, e-Business Security, Web
Programming.
• MSc.
– Info Security, Secure Software Development(MSc.
IS Security and digital criminology).
– Web Security(MSc. Web Intelligence).
Agenda
• Security observations.
• Security statistics.
• Insecurity: contributed factors.
• Why the interest in Data Mining.
• Application of Data Mining in Security.
• Ongoing Research Projects
Security: What can be observed over the last five
decades?
• DES & 3DES encryption (1974-1997).
• MD5 hashing (1991-1996).
• Very advanced encryption algorithms and
protocols(AES, RSA, SSL,…).
• More and more of perimeter defense (firewall, Anti-
Viruses, Authentication, Access Controls…).
However, security incidents are increasing
significantly!!!!
Security: What do statistics really tell us?
for Microsoft Applications
Source: http://www.cvedetails.com/
What About Software Developers!!!!!!
Insecurity: Contributed factors
New technological innovations
– Web 2.0
– IoT
– Mobile App.
– Cloud
• Connectivity
• Extensibility
• Complexity
• Instant user generated
contents/applications
• Security as an afterthought
The Golden rule:
A 100% Secure system is not exist!
Why the interest in Data Mining
• Security is pervasive and perimeters are dissolving:
– Cloud
– Mobile/BYOD
– OSN
– E-Business
• Data Mining is powerful.
– Classification
– Clustering
– Prediction
– Contextual intelligence
– Big Data analytics
– Long-term correlation
Application of Data Mining in
Security
• Huge amount of data is produced over the cyberspace.
• Remarkable increase in the rate of various types of
cyber-attacks.
• DM can contribute to several security areas such as:
1. Behavioral Biometrics & Continuous Authentication.
2. Malicious Spam detection.
3. Cybercrimes and Botnet detection.
4. Insider misuse detection
5. Sybil attacks
6. Adaptive security
Behavioral Biometrics &
Continuous Authentication
• Identification
• Verification
• Authentication
• Authorization
Methods of Authentication:
Something you Know.
Something you have.
Where you are.
Something you are.
Something you do.
Area #1: A Biometric Framework for Intrusion
Detection over Social Networks
Published work:
Alqatawna, J.: An adaptive multimodal biometric framework for intrusion detection
in online social networks. IJCSNS International Journal of Computer Science and
Network Security 15(4), 19–25 (2015)
• OSN platforms:
– Profile based service
– Extremely interactive and generate substantial
amount information.
– Subject to several security and privacy threats.
Characteristics of the proposed
framework
• Defense-in-depth:
1. A typical static authentication function at the
login stage.
2. A set of continuous authentication functions
during the user's active session:
I. Keystroke dynamics
II. Moues Dynamics
III. Touch Screen Dynamics
3. Profile-based Anomaly Detection.
Continuous Authentication
Login Logout
Static Authentication
Authentication
function
Something user
knows:
password,
PIN Code,
or secret
question
Set of
Continuous authentication
functions
user session
User activities over the OSN
Analyze
Detect
Continuous Authentication & Anomaly
Detection
Login Logout
Static Authentication
Authentication
function
Something
user knows:
password,
PIN Code,
or secret
question
user session
User activities over the OSN
Profile-Based
AnomalyDetector
Device
Detector
Keystroke
Dynamics
Mouse
Dynamics
Touch
Dynamics
Response
The Way Forward
• Prototype/implementation of the framework
components.
• Open Source OSN platform to apply these
components.
• Ground-truth Dataset.
• Effective data extraction and classification
techniques.
Area #2: Malicious Spam detection
Published work:
Alqatawna, J. , Faris, H. , Jaradat, K. , Al-Zewairi, M. and Adwan, O. (2015) Improving
Knowledge Based Spam Detection Methods: The Effect of Malicious Related Features in
Imbalance Data Distribution. International Journal of Communications, Network and System
Sciences, 8, 118-129. doi: 10.4236/ijcns.2015.85014.
Ongoing projects:
Project 1: Malicious Spam Detection in Email Systems of Educational Institutes.
Project 2: Spammers Detection over Online Social Networks Based on Public Attributes: The
case of twitter.
Project 1: Malicious Spam Detection in Email Systems of
Educational Institutes.
• 10,000 spam emails have been collected from
University of Jordan and are being analyzed
based on the following methodology:
– Social Engineering techniques employed by
attackers(topics, impersonation,
obfuscation,…etc.)
– Attack vectors: links, doc, exe, pdf, embedded
code.
– Malware families: adware, bot, ransomware,
rootkit,…etc.
Project 1: Malicious Spam Detection in Email Systems of
Educational Institutes…NEXT STEP
• Constructing a complete dataset (Spam and
Ham) from Educational context.
• Investigating Malicious spam features related
to the Ed. Context.
• Build effective classification method.
Project 2: Spammers Detection over Online Social Networks
Based on Public Attributes: The case of twitter.
• In OSNs phishing attack is four times more
effective than blind attempts1.
• Primary Attack vector: Spam messages with
malicious links.
• Many of the profile attributes are public and
can be extracted using TwitteR.
• MSc student is working on feature extraction.
1 Gao, H., Hu, J., Huang, T., Wang, J., & Chen, Y. (2011). Security issues in online social networks.
Internet Computing, IEEE, 15(4), 56-6
Feature extraction…
1. Suspicious Words : such as (Diet, Click here, Health, Make Money, Give Me, Vote , Free, etc.)
2. Default Image : Default image doesn’t changed for a while.
3. % Links in tweets: High Percentage links (URL) per tweet
4. Following to Followers ratio: follows more than being followed.
5. Repeated Words : High Percentage duplicate Words per tweet.
6. Tweet to response ratio: tweets more than responding to users comments.
7. Time between tweets: Tweets at the regular time internal.
8. Description – Tweets inconsistency: Profile description different form tweets topics.
9. Divers interest: Following or interest in various type of people.
10. Number of Tweet per Day : Number of tweet per day.
Another Area
• Botnet detection.
• Intrusion detection.
• Insider attacks and misuse detection.
• Sybil detection.
• Adaptive Security.