3. Mining Public Transport Usage for
Personalised Intelligent Transport Systems
Neal Lathia1, Jon Froehlich2, Licia Capra1
1
Dept of Computer Science, University College London
2
Computer Science and Engineering, University of Washington
IEEE ICDM 2010, Sydney, Australia
@neal_lathia
n.lathia@cs.ucl.ac.uk
4. mobility and sustainable transport
is both aided and encouraged with
info systems:
why are they not personalised?
traveller information systems
5. why personalise?
a wide range of people, with different
needs, preferences, constraints
only 46-62% of the travel time is spent
sitting on trains
majority of notifications, updates, &
events are irrelevant to travellers
6. why personalise?
a wide range of people, with different
needs, preferences, constraints
only 46-62% of the travel time is spent
sitting on trains
majority of notifications, updates, &
events are irrelevant to travellers
7. using to infer
dataset: <user, origin, destination, date, start, end>
what can we learn about user preferences from
fare collection systems?
what sort of personalised systems can be built?
what prediction/ranking algorithms can we use?
8. 2 x ~300,000 travellers (5%), ~7,000,0000
tube trips: aggregate
9. 2 x ~300,000 travellers (5%), ~7,000,0000
tube trips: aggregate
10. transport research focuses on
what this data tells us about the
system:
demand modelling
service reliability measurements
average travel time estimation
station transfer analysis
what does it tell us about the
travellers?
14. 2 x ~300,000 travellers (5%),
~7,000,0000 tube trips
hierarchical clustering
the data shows:
a huge diversity of travellers
measurable ranges of habits and
preferences: when & where to
travel, how long travel takes..
next step?
15. using to build
what applications?
personalised travel time: how long
will it take me to get there?
personalised notifications: which
stations' events are relevant to me?
...and more in our future work
16. using to build
what applications?
personalised travel time: how long
will it take me to get there?
personalised notifications: which
stations' events are relevant to me?
...and more in our future work
17. personalised trip time – 3 methods
self-similarity: implicitly capture route
choices, walk time
(weighted geometric mean of traveller's history)
18. personalised trip time – 3 methods
self-similarity: implicitly capture route
choices, walk time
familiarity: similar users
(neighbourhood model of travellers who are
similarly familiar)
19. personalised trip time – 3 methods
self-similarity: implicitly capture route
choices, walk time
familiarity: similar users
context: implicitly capture
historical trends in current trip
(two-sided sliding window moving average
model)
20. personalised trip time –
evaluation
evaluation
split data:
74 day training set (90%)
9 day test set (10%)
metrics:
mean absolute error (MAE)
mean absolute percentage error (MAPE)
21. personalised
evaluation (MAE) trip time –
evaluation
global mean – 11.45 mins
zone mean – 8.56 mins
journey planner – 6 mins (preliminary)
trip mean – 3.109 mins
familiarity – 2.989 mins
context – 2.986 mins
self-similarity – 2.924 mins
combined – 2.922 mins
22. personalised
evaluation (MAE) trip time –
evaluation
global mean – 11.45 mins error ~ trip time
zone mean – 8.56 mins
journey planner – 6 mins (preliminary) error highest for people
who only travel on
trip mean – 3.109 mins weekends
familiarity – 2.989 mins
context – 2.986 mins more trips reduces error
self-similarity – 2.924 mins
combined – 2.922 mins
23. using to build
what applications?
personalised travel time: how long
will it take me to get there?
personalised notifications: which
stations' events are relevant to me?
...and more in our future work
24. station interest ranking
predict (and rank) the stations that
travellers will visit in their future trips
for personalised notifications
current system: free travel alerts –
manually set up by traveller
25. station interest ranking
can we automate this?
baseline: rank by visit popularity
proposal: station similarity
neighbourhood (visit co-occurrence)
and traveller trip history
26. station interest ranking
1. begin with baseline ranking
2. add proportional weighting for
stations user has visited in the past
3. transform dataset into station-
station co-occurrence matrix, increase
weight of similar stations
metric: percentile ranking
28. without knowing who travellers are, the
network topology, train schedule,
disruptions and closures, we designed
personalised information services
for intelligent transport systems
29. 2 x ~300,000 travellers (5%),
~7,000,0000 tube trips
hierarchical clustering
what next?
larger, multi-modal datasets to
investigate and improve the
algorithms we evaluate here;
implementations for mobile
devices to study these
applications in the field;
examine other facets of travel
behaviour (e.g., ticket purchasing)
30. Mining Public Transport Usage for
Personalised Intelligent Transport Systems
We are hiring! interested? get in touch!
@neal_lathia
n.lathia@cs.ucl.ac.uk