System calls provide an interface to the services made available by an operating system. As a result, any functionality provided by a software application eventually reduces to a set of fixed system calls. Since system calls have been used in literature, to analyze program behavior we made an assumption that analyzing the patterns in calls made by a mobile application would provide us insight into its behavior. In this paper, we present our preliminary study conducted with 534 mobile applications and the system calls made by them. Due to a rising trend of mobile applications providing multiple functionalities, our study concluded, mapping system calls to functional behavior of a mobile application was not straightforward. We use Weka tool and manually annotated application behavior classes and system call features in our experiments to show that using such features achieves mediocre F1-measure at best, for app behavior classification. Thus leading to the conclusion that system calls were not sufficient features for app behavior classification.
2. Problem statement
We present Heimdall, a framework
for studying system calls made by
mobile apps in order to determine an
app’s behavior class and match such
behavior to their stated purpose.
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 2
3. UMBC Ebiquity Research Group | MobiSec 2017
Image courtesy: Android App Market
Motivation: App issues
Tuesday, May 2, 2017 3
5. Do you read privacy policies and do you
understand them?
Source: lifehacker
Motivation: Software Limitation
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 5
6. Related Work
• System calls used for software analysis Kosoresow’97
• Three areas of research for mobile app analysis
• Malware classification Zhou’12
• NLP techniques Pandtia’13, Gorla’14
• Taint tracking Enck’10
• Google PHA taxonomy Google Android Security’16
• PrivacyGrade: Grading The Privacy Of Smartphone Apps
• Expectation and purpose: understanding users’ mental models
of mobile app privacy through crowdsourcing. Lin’12
• Modeling users’ mobile app privacy preferences, Lin’14
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 6
8. Dataset distribution
10 annotated categories, 20 Google Play categories – 75% tool and
productivity
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 8
9. Experimental setup
• 1560 apps
• 534 successfully executed
• strace can only be used on emulator
• UI/Application exerciser tool used
• Android 6.0.1 December 2015 build used
• SVM, MLP, J48 and NB used
• Best F1-score from MLP at 0.44
• 1-hot vectors – Call present or absent
• TF-IDF weight vectors – Uniqueness and
significance of system calls for app
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 9
10. Similar TF-IDF scores - Scientific calculator vs To-do list
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 10
11. Results: Google categories
• Google’s categories are
developer provided
• Could be misleading
• System calls could classify
behavior better
• Additional features
required
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 11
12. Results: Annotated classes
• System calls not totally
useless
• 1-hot vs TF-IDF
Classifier TF-IDF 1-hot
MLP 0.44 0.26
SVM-Poly 0.31 0.23
SVM-RBF 0.32 0.21
J48 0.27 0.31
NB 0.27 0.27
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 12
13. Challenges faced
• Annotating app behavior class
• Emulator instabilities
• App limitations/bugs
• User credentials required
• Multi-behavior apps
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 13
14. Conclusions
Can system calls be used to
distinguish between how an app
“behaves” and it’s perceived/stated
purpose?
• System calls – insufficient as
features
• Emulator – better ones required
• Coarser behavior classes
required
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 14
15. Future work
• System call bi-grams, tri-grams – for capturing call sequences
• Better emulator – Genymotion
• Malware classification – less scope (state-of-the-art is 96.7%)
• Generating contextual policies from behavior estimation
• Features planned for future:
• App descriptions
• App ratings
• PHA app analysis – require samples for Mobile Unwanted Software
UMBC Ebiquity Research Group | MobiSec 2017Tuesday, May 2, 2017 15
RQ1: Can system calls be used to distinguish between how an app “behaves” and it’s perceived/stated purpose?
Why should we care?
Apps collect user data
Emails, Messages, Documents, Sensor data – Highly Personal Data
Can’t App permissions handle privacy and security of data?
App permissions – “Take it or leave it”
Is user okay with sharing location in public place not private place, no way to control that
Use Privacy and Security module to implement context-dependent Rules
We had 61 scientific calculators and 102 to-do list apps in 534 that’s a combined percentage of 21%
Very similar system calls but different app categories
fstatat - get file status relative to a directory file descriptor
ftruncate - truncate a file to a specified length
pwrite - read from or write to a file descriptor at a given offset
sigreturn - return from signal handler and cleanup stack frame
sendmsg - send a message on a socket
sigaction - examine and change a signal action
System calls aren’t useless because the random choice for a 10 classes gives us a probability of 0.1 and we have 0.44 at best and 0.27 at worst
Classifiers perform better with TF-IDF features; understandable because just like document classification presence of a system call is significant while presence of a unique system call is a better feature than presence of a particular system call or a group of system calls