2. • Monitor and track user behavior on smartphones using various
on-device sensors
• Convert sensory traces and other context information to Personal
Behavior Features
• Build Risk Analysis Trees with these features and use it for
calculation of Certainty Scores
• Trigger various Authentication Schemes when certain application
is launched.
2
5. 60% • “The 329 organizations
polled had collectively lost
50% more than 86,000 devices
… with average cost of lost
40% data at $49,246 per device,
30%
worth $2.1 billion or $6.4
million per organization.
20%
10%
"The Billion Dollar Lost-Laptop Study,"
0% conducted by Intel Corporation and the
Ponemon Institute, analyzed the scope
and circumstances of missing laptop
Mobile Device Loss or theft PCs.
Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September
21-28, 2010, with an oversample in the top 20 cities (based on population).
5
6. Application
Password Different
applications may
have different
A major source of
sensitivities
security vulnerabilities.
Easy to guess, reuse,
forgotten, shared
Usability
Authentication too-often or
sometimes too loose
6
9. • MobiSens app collects sensor data
• Motion sensors
• GPS and WiFi Scanning
• In-use applications and their traffic patterns
• SenSec module build user behavior models
• Unsupervised Activity Segmentation and model the sequence using
Language model
• Building Risk Analysis Tree (DT) to detect anomaly
• Combine above to estimate risk (online): certainty score
• SenSec broadcast certainty score to other applications
• Application Access Control Module uses broadcast receiver
9
10. • Feature vector calculated from a step window represent the
behavior state within a given time window
• surrounding environment: GPS location, WiFi signal
• activity: motions, applications in use
• communication: network traffic
• Using Decision Tree to detect anomaly in behaviors
• Each node represents a feature dimension
• Leaves can be one of the following
• Owner Detection: owner [0,1], 0: Anomaly, 1: Normal
• User Identification: user id [0,1,…. N], user’s identification, i.e. IMEI
• Multiple trees can be built with subset of feature space
• Weighted average
• Voting
10
11. • Convert feature vector series to label streams – dimension reduction
• Using n-gram to model sequence of label stream for each sensory
dimension – current state and transition captured
• Step window with assigned length
A1 A2 A1 A4
G2 G5 G2 G2
W2 W1 W2
P1 P3 P6 P1
A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1
11
12. • User behavior at time t depends only on the last n-1 behaviors
• Sequence of behaviors can be predicted by n consecutive
location in the past
• Maximum Likelihood Estimation from training data by counting:
• MLE assign zero probability to unseen n-grams
Incorporate smoothing function (Katz)
Discount probability for observed grams
Reserve probability for unseen grams
12
13. • Feed sequence of the past behaviors in a stepping window of size
N to n-gram model for testing
• For a testing sequence of behavior labels
• Estimate the average log probability this sequence is generated
from the n-gram
• If this likelihood drops below a threshold, flag an anomaly alert
13
17. • Total data set size: 4GB
Dataset • Remove 2 heavy users
Numer of users 50
• Remove users with very
Device Android phones limited data duration
• Remove users that don’t
Location Bay area
have application and traffic
Averag period 30 days data due to older MobiSens
version
Number of data
7
types • 25 users with comparable
Finest sampling dataset size
interval (motion 200 ms
sensors) • Data duration: 4 hour ~ 2.5
days
17
18. • Motion Sensors (100)
• Used to summarize
acceleration stream
• Calculated separately for each
dimension [x,y,z,m]
• GPS: location label via density based clustering (1)
• WiFi: (SSIDs, RSSIs) pairs ranked by signal strength (6)
• Applications: Bitmap of well-known applications (60 + 1)
• Application Traffic Pattern: Tx/Rx traffic vectors (120 + 2)
• Step Window Size: 5 seconds
18
19. • User Identification Test and Owner Detection Test for randomly
selected partial data set (4 users) with 1:1 training/test split
• ~ 99% accuracy
• number of leaves: 56 , size of tree: 111
• Using non-motion attributes yields lower accuracy (96%)
• Significant tree size reduction, number of leaves: 3, size of tree: 5
• Cross entropy may be significant to easily distinguish users using some
features.
• Using only motion attributes can distinguish different users
• ~ 98% accuracy
• very large tree, number of leaves: 267, size of tree 533
• may cause performance issues on mobile platform
19
20. • Apply cross-entropy filter to remove users that could be identified
easily using a small set of features
• 12 users with 210k data instances
• User identification : train RAT model on 66% instances and rest
as testing
84.8% 83.5 79.3
100
7649
80
60 Accuracy
40 Size Factor
20 221 35
0
All Non-Motion Motion-Only
20
22. • Experiments to discover anomaly usage with ~80% accuracy with
only days of training data
22
23. • Extended data set for feature construction
TCP, UDP traffic; sound; ambient lighting; battery status, etc.
• Data and Modeling
Gain more insights into the data, features and factorized relationships among
various sensors
Try other classification methods and compare results: LR, SVM, Random
Forest, etc
• Enhanced security of SenSec components
Integration with Android security framework and other applications
• Privacy challenges
Data collection, model training, privacy policy, etc.
• Energy efficiency
23
28. • Data Collection 9.=$(1/6'9.=$;1'
(1/6$/<' 9.=$(1/6'7+"@1/:
• Running app list
!55;$"+#$./ A$21;.<<1,'
C./#,.;
D0 31%$"1' !55;$"+#$./6
• Per-app traffic pattern 4,.2$;1'!40
!"#$%$#&'
9166+<1' ()**+,$-+#$./'
• IPC Interface !"#$%$#&' 4..; 0/#1,2+"1
(1<*1/#+#$./ 31%$"1'
C./#,.;;1, 9.:1;
(#.,+<1' 718+%$.,'9.:1;$/<'
• Certainty Score 4)68$/<
B1=(1,%$"1' (&6#1* !;<.,$#8*6
3+#+'
Broadcast mechanism !<<,1<+#., 3+#+'
>?"8+/<1' 9.=$(1/6'
!40 3+#+'
3+#+ 3+#+'4,15,."166.,
>?"8+/<1'
(1/6.,' D5;.+: !40
B$:<1#6
E+F'9.=$(1/6'9.=$;1'!55;$"+#$./ E=F'G$1,'H E"F'G$1,'I
• Offline-Model Push via Data Exchange API
• Risk Analysis Tree can be trained using global data on the MobiSens Server
and pushed back to the mobile device
28
29. • MobiSens Server
• Offline Clustering
• K-means package from Weka Data Mining Toolkit
• Using aggregated data from all users
• Offline RAT training
• Decision Tree package from Weka Data Mining Toolkit
• Construct training data set and design evaluation strategy
• MobiSens Client
• Retrive RAT model from MobiSens Server
• On-device n-gram label sequence construction (n=1,2,3; window size =5s)
• RAT inference using Weka Toolkit on device
• Status bar notification based on certainty value
29
30. • Reactive API to Team Access
API call from Team Access to SenSec to retrieve the current Certainty Score
given the context
getCertaintyScore(SenSecContextType ctx, count)
• Proactive API to Team Acess and other equivalent modules
Broadcast Receiver on Certainty Score
certaintyScore{
CertaintyScoreType scores[];
WindowSizeType window_size;
SenSecContextType ctx;
}
30