Wearable Computing - Part III: The Activity Recognition Chain (ARC)

Daniel Roggen
2011
Wearable Computing
Part III
The Activity Recognition Chain (ARC)

© Daniel Roggen www.danielroggen.net droggen@gmail.com
Focus: activity recognition
• Activity is a key element of context!
Fitness coaching
iPhone: Location-based
services
Step counter
Wii
Fall detection, alarm
Elderly assistant

There is no « Drink Sensor » 
• Simple sensors (e.g. RFID) can provide a "binary" information
– Presence (e.g. RFID, Proximity infrared sensors)
– Movement (e.g. ADXL345 accelerometer ‘activity/inactivity pin’)
– Fall (e.g. ADXL345 accelerometer ‘freefall pin’)
• But in general « activity-X sensor » does not exist
– Sensor data must be interpreted
– Multiple sensors must be correlated (data fusion)
– Several factors influence the sensor data
• Drinking while standing: the arm reaches the object then the mouth
• Drinking while walking: the arm moves, and also the whole body
• Context is interpreted from the sensor data with
– Signal processing
– Machine learning
– Reasoning
• Can be integrated into a « sensor node » or « smart sensor »
– Sensor chip + data processing in a device

User Activity Structure
Working Resting Working Resting Working Resting
Year 1 Year 2 Year 3
Go to work Read mailMeeting Shopping Go home
Enter Give talk Listen Leave
Walk ShowSpeak Stand SpeakSpeak
Week 10 Week 11 Week 12

How to detect a presentation?
• Place
– Conference room
– In front of audience
– Generally at the lectern
• Sound
– User speaks
– Maybe short interruptions
– Otherwise silence
• Motion
– Mostly standing, with small walking motion
– Hand motion, pointing
– Typical head motion

Greeting
Sensorplatzierung
 Upper body
 Right wrist
 Left upper leg
Activity
 Person is seated
 Stands up
 Greets somebody
 Seats again

Greeting
-2g
+2g
0g
-1g
+1g
/-2g
/+2g
/0g

Data recording
Stand up Sit downSeating Standing Seating
Upper body
Wrist
Hand on
table
Hand on tableArm motion Arm motion
Handshake
Time [s]
Combination from individual data is distinctive of the activity!
Acceleration[g]

Turn pages:Drink from a glass:
How to recognize activities?
With sensors on the body, in objects, in the environment, …
1. Activities are represented by typical signal patterns
Sensor data
2. Recognition: "comparison" between template and sensor data
Drink recognized Turn page recognized
Motion sensorActivity = movement
Activity = sound Microphone

Characteristic Type Description
Execution Offline The system records the sensor data first. The
recognition is performed afterwards. Typically used
for non-interactive applications such as activity
monitoring for health-related applications.
Online The system acquires sensor data and processes it
on-the-fly to infer activities. Typically used for activity-
based computing and interactive applications
(HCI).
Recognition Continuous The system “spots” the occurrence of activities or
gestures in streaming data. It implements data
stream segmentation, classification and null class
rejection.
Isolated /
segmented
The system assumes that the sensor data stream is
segmented at the start and end of a gesture by an
oracle. It only classifies the sensor data into the
activity classes. The oracle can be an external
system in a working system (e.g. cross-modality
segmentation), or the experimenter when assessing
classification performance during design phases.
Recognition system characteristics

Activity recognition: learning by demonstration
• Sensor data
• 1) train activity models
• 2) recognition
Sensor
data Recognition
=?
Context
Activity
Training data is required
Activity
models
Model Training
Training

World model Stateless The recognition system does not model the state
of the world. Activities are recognized by
spotting specific sensor signals. This is currently
the dominant approach when dealing with the
recognition of activity primitives (e.g. reach, grasp).
Stateful The system uses a model of the world, such as
user’s context or environment map with location
of objects. This enhances activity recognition
performance, at the expense of designtime
knowledge and more complex recognition system.
Activity models

Assumptions
• Constant sensor-signal to activity-class mapping
• Design-time: identify sensor-signal/activity-class mapping
– Sensor setup
– Activity sets
• Run-time: "low"-variability
– Can't displace sensors or modify garments
– Can't change the way activities are done

The activity recognition chain (ARC)
• A standard set of steps followed by most research in activity
recognition (e.g. [1,2,3,4])
• Streaming signal processing
• Machine learning
• Reasoning
[1] J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and accelerometers,”
IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.
[2] L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l Conference, Apr. 2004,
pp. 1–17.
[3] D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,” Pervasive and
Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.
[4] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010

Low-level activity
models (primitives)
Design-time: Training phase
Optimize
Sensor data
Annotations
High-level activity
models
Optimize
Context
Activity
Reasoning
Symbolic processing
Activity-aware
application
A1, p1, t1
A2, p2, t2
A3, p3, t3
A4, p4, t4
t
[1] Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation Magazine, 2011
Runtime: Recognition phase
FS2 P2
S1 P1
S0
P0
S3 P3
S4 P4
S0
S1
S2
S3
S4
F1
F2
F3
F0
C0
C1
C2
Preprocessing
Sensor sampling Segmentation
Feature extraction
Classification
Decision fusion
R
Null class
rejection
Subsymbolic processing

Segmentation
• A major challenge!
• Find the boundaries of activities for later classification
Classification
Drink Turn
• Methods:
– Sliding window segmentation
– Energy-based segmentation
– Rest-position segmentation
– HMM [1], DTW [2,3], SWAB [4]
[1] J. Deng and H. Tsui. An HMM-based approach for gesture segmentation and recognition. In 15th International Conference on Pattern Recognition, volume 3,
pages 679–682. 2000.
[2] M. Ko, G. West, S. Venkatesh, and M. Kumar, “Online context recognition in multisensor systems using dynamic time warping,” in Proc. Int. Conf. on
Intelligent Sensors, Sensor Networks and Information Processing, 2005, pp. 283–288.
[3] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008
[4] E. Keogh, S. Chu, D. Hart, and M. Pazzani. An online algorithm for segmenting time series. In Proceedings of the IEEE International Conference on Data
Mining, pages 289–96, 2001.
• Classification here undefined
– classifier not trained on "no activity"
– "null class" hard to model: can be
anything
• Or use "null class rejection" after
classification

Segmentation: sliding/jumping window
• Commonly used for audio processing
– E.g. 20 ms windows
• or for periodic activities
– E.g. walking, with windows of few seconds

Activity kinds Periodic Nature of activities exhibiting periodicity, such as
walking, running, rowing, biking, etc. Sliding
window and frequency-domain features are
generally used.
Sporadic The activity or gesture occurs sporadically,
interspersed with other activities or gestures.
Segmentation plays a key role to to isolate the
subset of data containing the gesture.
Static The system deals with the detection of static
postures or static pointing gestures. Sliding
window and time-domain features are generally
used.
Activity characteristics

Segmentation
• Energy-based segmentation [1]
– Between activities the user does not move
– Low energy in the acceleration signal
– E.g. standard deviation of acceleration compared to a threshold
• Rest-position segmentation [1]
– User comes back to a rest position between gestures
– Can be trained
• Challenge:
– Usually no 'pause' or 'rest' between activities!
– Combination of segmentation and null class rejection
– E.g. DTW [2]
[1] Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor Networks (BSN), 2010
[2] Stiefmeier, Wearable Activity Tracking in Car Manufacturing, PCM, 2008

Feature extraction
• Compute features on signal that emphasize signal
characteristics related to the activities
• Tradeoffs
– Reduce dimensionality
– Computational complexity
– Maximize separation between classes
– Specificity of the features to the classes: robustness, overfitting
• Some common features for acceleration data [1]:
[1] Figo, Diniz, Ferreira, Cardoso. Preprocessing techniques for context recognition from accelerometer data, Pers Ubiquit Comput, 14:645–662, 2010
mean
std

Car manufacturing activities
Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008
Dataset available at: http://www.wearable.ethz.ch/resources/Dataset

Feature space: car manufacturing
activities
Data from Zappi et al, Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection, EWSN, 2008
Dataset available at: http://www.wearable.ethz.ch/resources/Dataset
Angle X, angle Y, angle Z Energy X, Energy Y, Energy Z Energy, angle X, angle Y
Energy X, Energy Y Energy, angle XAngle X, angle Y

1 = Stand; 2= Walk; 3 = Sit; 4 = Lie
• Mean crossing rate of x, y and z axes, std of magnitude
Feature space: modes of
locomotion (FS1)
Calatroni et al, Transferring Activity Recognition Capabilities between Body-Worn Motion Sensors: How to Train Newcomers to Recognize Modes of
Locomotion, INSS, 2011

• Mean value of x, y and z axes, std of magnitude
locomotion (FS2)

• Ratio of x and y axes, ratio of y and z axes, std of magnitude
locomotion (FS3)

• Mean value of x, y and z axes, std of x, y and z axes
locomotion (FS4)

Less overlapping features yield better accuracies regardless with all classifiers
Classification Accuracy
Feature set 1 Feature set 2 Feature set 3 Feature set 4
NCC 11-NN NCC 11-NN NCC 11-NN NCC 11-NN
Knee 0.64 0.71 0.94 0.95 0.94 0.94 0.95 0.94
Shoe 0.53 0.65 0.68 0.86 0.7 0.86 0.77 0.87
Back 0.6 0.7 0.79 0.81 0.66 0.74 0.78 0.82
RUA 0.53 0.58 0.77 0.84 0.72 0.75 0.73 0.86
RLA 0.45 0.59 0.72 0.81 0.67 0.8 0.61 0.84
LUA 0.55 0.64 0.86 0.85 0.78 0.85 0.75 0.87
LLA 0.6 0.66 0.7 0.82 0.75 0.8 0.68 0.82
Hip 0.57 0.62 0.77 0.81 0.81 0.79 0.77 0.79
kNN better than NCC, more evident for more overlapping features

Feature extraction
• Ideally: explore as many features as possible
– Not limited to "human design space"
• Evolutionary techniques to search a larger
set of solutions
– E.g. genetic programming
[1] Förster et al., Evolving discriminative features robust to sensor displacement for activity recognition in body area sensor networks, ISSNIP, 2009
Space of all
possible
designs
Human
design
space
Example evolved featureCross-over genetic operator

Feature selection
• Select the "best" set of features
• Improve the performance of learning models by:
– Alleviating the effect of the curse of dimensionality.
– Enhancing generalization capability.
– Speeding up learning process.
– Improving model interpretability.
• Tradeoffs
– Select features that correlate strongest to the classification variable
(maximum relevance), ...
– ... and are mutually far away from each other (minimum redundancy)
– Emphasize characteristics of signal related to activity
– Computational complexity (minimize feature number)
– Complementary
– Robustness
F1
F2
F3
F4
F5
F6 F7
F8
F9
[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005

Feature selection
Filter methods
• Does not involve a classifier but a
'filter', e.g. mutual information
• +
– Computationally light
– General: good for a larger set of
classifiers
• -
– Feature set may not be ideal for all
classifiers
– Larger subsets of features
Set of
candidate
features
Subset
selection
algorithm
Learning
algorithm
Wrapper methods
• Involves the classifier
• +
– Higher accuracy (exploits
classifier's characteristics)
– Can avoid overfitting with
crossvalidation
• -
– Computationally expensive
– Not general features
Set of
candidate
features
Subset
evaluation
Learning
algorithm
learning
algorithm
Subset
selection
algorithm

Sequential foward selection (SFS)
• "Brute force" is not applicable!
– With N candidate features: 2N
feature sets to test
1. Start from an empty feature set Y0={Ø}
2. Select best feature x+
that maximize an objective function J(Yk+x+
):
x+ = argmax[J(Yk+x+
)]
3. Update feature set: Yk+1 = Yk + x+
; k=k+1
4. Go to 2
[1] Peng et al., Feature selection based on mutual information-criteria of max-dependency max-relevance and min-redundancy, PAMI, 2005
• Works well with small number of features
• Objective: measure of “goodness” of the features
– E.g. accuracy

Classification
• Map feature vector to a class label

Bayesian classification
• F: sensor reading, features
• C: activity class
P(F)
P(C¦F) =
P(F¦C) · P(C)
P(F¦C): conditional probability of sensor reading Z knowing x
P(C): prior probability of class
P(F): marginal probability (sum of all the probabilities to obtain F)
P(C¦F): posteriori probability
Bayes theorem:
• With multiple sensors: conditional independence (Naive Bayes)
P(F)
P(C¦F1,...Fn) =
P(F1,....Fn¦C) · P(C)
P(F)
P(F1¦C) · ... · P(Fn¦C) · P(C)
=
• In practice only the numerator is important (denominator is constant)
• Classification with a detector: e.g. class with max posterior probability
From training data

• Memory: C class centers
• Classification: C comparisons
• Pros:
– Simple implementation
– Online model update: add/remove classes, adapt class center
– Fast, few memory
• Cons:
– Simple class boundaries
– Suited when classes cluster in the feature space
Nearest centroid classifier (NCC)
• Simplest classification methods
– No parameters
– Classify to the nearest class center
?
F1
F2

k-nearest neighbor (k-NN)
• Simple classification methods
– Instance based learning
– Classify to most represented around the test point
– Parameter: k
– k=1: nearest neighbor (overfit)
– k>>: "smoothes" noise in training data
[1] Garcia et al, K-nearest neighbor search-fast GPU-based implementations and application to high-dimensional feature matching, ICIP, 2010
Figure from http://jakehofman.com/ddm/2009/09/lecture-02/
?
F1
F2
• Memory: N training points
• Classification: N comparisons
• Pros:
– Online model update (add/remove instances, classes)
– Complex boundaries
• Cons:
– Potentially slow, or lots of memory
• Some faster versions
– GPGPU [1]
– Kd-trees to optimize neighborhood search

Decision tree
• Simple classification methods
– Programmatic tree
– Parameters: decision boundaries
• C4.5
?
F1
F2
t1
t2
F1
F2
< t1 >= t1
< t2 >= t2
• Memory: Decision boundaries
• Classification: lightweight if/else comparisons
• Pros:
– Continuous and discrete values, symbols
• Cons:
– Appropriate when classes separate along feature dimensions
• Or PCA
– Limit the size of the tree to avoid overfitting
Quinlan, J. R. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, 1993

Null-class rejection
• Continuous activity recognition with sliding window segmentation
– Gestures are not always present in a segment
– Must be "null class"
• Or confidence in the classification result is too low
[1] Calatroni et al., ETHZ Tech Report, 2010
[2] I. Cohen and M. Goldszmidt, “Properties and benefits of calibrated classifiers,” in Proc. Knowledge Discovery in Databases (PKDD), 2004.
NCC kNN
• Many classifiers can be "calibrated" to have probabilistic outputs [2]
– Statistical test / likelihood of an activity

Sliding window and temporal data structure
• Activities where temporal data structure generally not important:
– Walking, running, rowing, biking...
– Generally periodic activities
• Activities where it is important:
– Open dishwasher: walk, grasp handle up, pull down, walk
– Close dishwasher: walk, grasp handle down, pull up, walk
– Opening or closing car door
– Generally manipulative gestures
– Complex hierarchical activities
• Problem with some features:
– Different sensor readings but identical features: μ1 = μ2
μ1
μ2
Act A
Act B

Sliding window and temporal data structure
• Time to space mapping
• Encode the temporal unfolding in the feature vector
– E.g. subwindows
μ1,1
μ2,1
μ1,2
μ2,2
Act A
Act B
sw1
sw2
A
B
• Other approaches:
– Hidden Markov models
– Dynamic time warping / string matching
– Signal predictors

t-1
Predictor
(CTRNN)
Error
ax
t
ay
t
az
t
ay
t-1
az
t-1
ax
t-1
py
t
pz
t
px
t
Prediction error
for gesture of
class 1
Gesture recognition using neural-network signal predictors
• Signal: 3-D acceleration vector
• Predict future acceleration vector
• Operation on raw signal a
t
Predictor
t0 t1
Time delayRaw
acceleration
Prediction Prediction
error
Class=best
predictiont-1
Predictor
(CTRNN)
Error
Prediction error
for gesture of
class 2
• Predictors “trained” on gesture classes
• Prediction error smaller on trained class
[1] Bailador et al., Real time gesture recognition using Continuous Time Recurrent Neural Networks, BodyNets, 2007

Predictor:
Continuous Time Recurrent Neural Network (CTRNN)
Continuous-time recurrent neural network (CTRNN)
• Continuous model neurons
• Fully connected network
• Rich dynamics (non-linear, temporal dynamics)
• Theoretically: approximation of any dynamical system
• Well suited as universal predictor
γi γj
ωij

Architecture of CTRNN Predictor
• 5 neurons, fully connected
• 3 inputs. Acceleration vector in previous step
• “Hidden” neurons
• 3 outputs. Acceleration vector in next step
• Connections between neurons/inputs
= State of neuron i at time t
= Connection weight between neuron i and j
= Connection weight of input k to neuron i
= Value of input k (X,Y,Z)
= Bias of neuron j
= Time constant of neuron i
= 0.01 secsDiscretization using Forward Euler numerical integration:

Training of the signal predictors
• Record instances of each gesture class
• Train one predictor for each class
• For each class: minimize prediction error
• Genetic algorithm
– Robust in complex search spaces
– Representation of the parameters by a genetic string (binary string)
• Global optimization of neural network parameters
– Neuron interconnection weights
– Neuron input weights
– Time constant
– Bias

Genetic algorithm
···Neuron
weights
Input
weights
6 bits
Bias
&
Time
Constant
Neuron parameters: 60 bits
Genetic string (5 neurons): 300 bits
Fitness function
• Minimize prediction error for a given class
• Measured on N of a training set T (T1...TN)
• Lower is better (smaller prediction error)
GA Parameters
• 100 individuals
• Rank selection of the 30 best individuals
• One-point crossover rate: 70%
• Mutation rate: 1% per bit
• Elitism

Experiments
• 8 gesture classes
• Planar
• Acceleration sensor on wrist
• 20 instances per class (one person)
• "Restricted" setup
– No motion between gestures
– Automatic segmentation (magnitude of the signal >1g indicates gesture)
• "Unconstrained" setup
– Freely moving in an office, typical activities (sitting, walking, reading …)
– Manual segmentation pressing a button

Results: unconstrained setup
• Training: 62%-100% (80.5% average); testing: 48%-92% (63.6% average)
• User egomotion
Training Testing

Prediction error: gesture of class A

Prediction error: one instance per class

Activity segmentation and classification with string matching
Strings
Trajectories
Sensors +
Signal Processing
Sensors
becfcca
aabadca
bad
cfcc
Templates
Segments
bad
cfcc
String
Matchin
g
Fusion
Overlap
Detection
Activity Spotting
Filtering
[1] Stiefmeier et al., Wearable Activity Tracking in Car Manufacturing, PCM, 2008

Motion encoding
a
b
c
d
e
f
g
h
Codebook
x
y
b
b
c
c
b
b
d
d
c
c
b
b
b
b
Direction
Vector
Trajectory

• Approximate string matching algorithm is used to
spot activity occurrences in the motion string
– Based on a distance measure called Levensthein or edit distance
– Edit distance involves symbol operations associated with dedicated
costs
• substitution/replacement r
• insertion i
• deletion d
– Crucial algorithm modification to find template occurrences at arbitrary
positions within the motion string
String matching

Approximate string matching

Spotting operation
t
Matching
Cost C1(t)
b bd cc bb
kthr,1
Activity
End Point
Activity
Start Point
Spotted
Segment

String matching
• +
– Easily implemented in FPGAs / ASIC
– Lightweight
– Computational complexity scales linearly with number of
templates
– Multiple templates per activities
• -
– Need a string encoding
– Hard to decide how to quantize sensor data
– Online implementation requires to "forget the past"

Activity Recognition with Hidden Markov model
• Markov chain
– Discrete-time stochastic process
– Describes the state of a system at successive times
– State transitions are probabilistic
– Markov property: state transition depends only on the
current state
– State is visible to the observer
– Only parameter: state transition probabilities
• Hidden Markov model
– Statistical model which assumes the system being
modeled is a Markov chain
– Unknown parameters
– State is NOT visible to the observer
– But variables influenced by the state are visible
(probability distribution for each state)
– Observations generated by HMM give information
about the state sequence
0
1
2
3
a01
a02
a12
a13
a23
a00
0
1
2
3
a01
a02
a12
a13
a23
a00
Z1Z0
b21
b20
Z2
b22

Hidden Markov model: parameters
0
1
2
3
a01
a02
a12
a13
a23
a00
aij: state transition probabilities (A={aij})
bij: observation probabilities (B={bij})
Π: initial state probabilities
N: number of states (N=4)
M: number of symbols (M=3)
X: State space, X={x1, x2, x3...}
Z: Observations, Z={z1, z2, z3...}
a00 a01 a02 a03
a10 a11 a12 a13
a20 a21 a22 a23
a30 a31 a32 a33
b00 b01 b02
b10 b11 b12
b20 b21 b22
b30 b31 b32
Π0 Π1 Π2 Π3
λ(A,B,Π): HMM model
Z1Z0
b21
b20
Z2
b22

Hidden Markov model: 3 main questions
Find most likely sequence of states generating Z: {xi}T
• Model parameters λ known, output sequence Z known
• Viterbi algorithm
HMM training: find the HMM parameters λ
• (Set of) Output sequence(s) known
• Find the observation prob., state transition prob., ....
• Statistics, expectation maximization: Baum-Welch algorithm
Find probability of output sequence: P(Z¦ λ)
• Model parameters λ known, output sequence Z known
• Forward algorithm

• Waving hello (by raising the hand)
– Raising the arm
– Lowering the arm immediately after
• Handshake
– Raising the arm
– Shaking
– Lowering the arm
Handraise v.s. Handshake
• Measurements: angular speed of the lower arm at the elbow
• Only 3 discrete values:
– <0, negative angular speed
– =0, zero angular speed
– >0, positive angular speed
α
> > > > < < < = < <
> > = < = = < > < < > < < < < < = < < <
Handraise
Handshake
?

Classification with separate HMMs
• Train HMM for each class (HMM0, HMM1, ....) with Baum-Welch
– HMM models the gestures
• Classify a sequence of observations
– Compute the probability of the sequence with each HMM
• Forward algorithm: P(Z / HMMi).
– Consider the HMM probability as the a priori probability for the
classification
• In general the class corresponds to the HMM with the highest probability
C=0,1
Gesture 0
Gesture 1
HMM0
HMM1
P(G=0)
P(G=1)
MaxGesture
Gesture 2
Gesture 3
HMM2 P(G=2)
HMM3 P(G=3)
C=0,1,2,3
Training / testing dataset Likelihood estimation Classification w/maximum
likelihood

Validation of activity recognition [1]
• Recognition performance
– Confusion matrix
– ROC curve
– Continuous activity recognition measures
– Latency
• User-related measures
– Comfort / user acceptance
– Robustness
– Cost
• Processing-related measures
– Computational complexity, memory
– Energy
• ... application dependent!
[1] Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality of Context
(QuaCon), 2009

Performance measures: Confusion matrix
• Instance based
• Indicates how an instance is classified / what is the true class
• Ideally: diagonal matrix
• TP / TN: True positive / negative
– correctly detected when there is (or isn't) an activity
• FP / FN: False positive / negative
– detected an activity when there isn't, or not detected when there is
• Substitution: correctly detected, but incorrectly classified
(QuaCon), 2009

Performance measures: ROC curve
• Receiver operating characteristic
• Indicates classifier performance when a parameter is varied
– E.g. null class rejection threshold
• True positive rate (TPR) or Sensitivity
– TPR = TP / P = TP / (TP + FN)
• True negative rate
– FPR = FP / N = FP / (FP + TN)
– Specificity = 1 − FPR
(QuaCon), 2009

Performance measures: online activity recognition
• Problem with previous measures: suited for isolated activity
recognition
– I.e. the activity is perfectly segmented
• Does not reflect performance of online (continuous) recognition
• Ward et al introduce [2]:
– Overfill / underfill: activities detected as longer/shorted than ground truth
– Insertion / deletions
– Merge / fragmentation / substitutions
(QuaCon), 2009
[2] Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011
from [2]
from [1]

Validation
Entire dataset
Training / evaluation
Train set Test set
• Optimization of the ARC on
the train set
• Includes feature selection,
classifier training, null class
rejection, etc
• Never seen during training
• Assess generalization
• Used only once for testing
• (otherwise, indirectly
optimizing on test set)
Cross-validation Fold 1
Fold 2
Fold 3
Fold 4
• 4-fold cross-validation
• Assess whether results generalize to independent
dataset

Validation
• Leave-one-out cross-validation:
– Train on the entire samples minus one
– Test on the left-out sample
• In wearable computing, various goals:
– Robustness to multiple user (user-independent)
– Robustness to multiple sensor placement (placement-independent)
– ...
Leave out Assess performance
Person User-independent
Day, week, ... Time-independent (e.g. if the user
can change behavior over time)
Sensor placement Sensor-placement-independent
Sensor modality Modality-independent
... ...

For further reading
ARC
• Roggen et al., Wearable Computing: Designing and Sharing Activity-Recognition Systems Across Platforms, IEEE Robotics&Automation
Magazine, 2011
Activity recognition
• Stiefmeier et al, Wearable Activity Tracking in Car Manufacturing, PCM, 2008
• J. Ward, P. Lukowicz, G. Tröster, and T. Starner, “Activity recognition of assembly tasks using body-worn microphones and
accelerometers,”
• IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 28, no. 10, pp. 1553–1567, 2006.
• L. Bao and S. S. Intille, “Activity recognition from user-annotated acceleration data,” in Pervasive Computing: Proc. of the 2nd Int’l
Conference, Apr. 2004, pp. 1–17.
• D. Figo, P. C. Diniz, D. R. Ferreira, and J. M. P. Cardoso, “Preprocessing techniques for context recognition from accelerometer data,”
Pervasive and Mobile Computing, vol. 14, no. 7, pp. 645–662, 2010.
• Roggen et al., An educational and research kit for activity and context recognition from on-body sensors, Int. Conf. on Body Sensor
Networks (BSN), 2010
Classification / Machine learning / Pattern recognition
• Duda, Hart, Stork, Pattern Classification, Wiley Interscience, 2000
• Bishop, Pattern recognition and machine learning, Springer, 2007 (http://research.microsoft.com/en-us/um/people/cmbishop/prml/)
Performance measures
• Villalonga et al., Bringing Quality of Context into Wearable Human Activity Recognition Systems, First International Workshop on Quality
of Context (QuaCon), 2009
• Ward et al., Performance metrics for activity recognition, ACM Transactions on Information Systems and Technology, 2(1), 2011

Wearable Computing - Part III: The Activity Recognition Chain (ARC)

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (18)

Similaire à Wearable Computing - Part III: The Activity Recognition Chain (ARC)

Similaire à Wearable Computing - Part III: The Activity Recognition Chain (ARC) (20)

Plus de Daniel Roggen

Plus de Daniel Roggen (9)

Dernier

Dernier (20)

Wearable Computing - Part III: The Activity Recognition Chain (ARC)

Notes de l'éditeur