TomTom has the mission of creating a world free of congestion and better driving experience. In order to do that, we need to understand driving behavoiur from end users, at the same time that we optimize the operational costs of our services. However, due to the large scale of our probe data from vehicles providing insights and performing advanced analytics can can be quite challenging.
During this discussion I will showcase two use cases where Databricks, Delta Lake and MLflow has enabled us to accelerate innovation. The first one is the IQMaps usecase. IQMaps is a system designed specifically for in-dash systems – taking the same up-to-date user experience you expect from navigation apps and bringing it to reliable, in-car navigation. IQ Maps learn the drivers’ driving patterns and updates the map regions that are most relevant to the user, using Wi-Fi or 4G. However, optimizing the data network consumption, which can have a high cost, while keeping the best driving experience, by having the map updated, requires complex simulations using millions of locations traces from vehicles. Apache Spark has been our key instrument to find the best balance to this trade off. The second use case is Destination Prediction. For many years, we have offered a personalized feature on our navigation products that predicts with high accuracy the driver’s next destination. Nonetheless, with the exponential increase and availability of data, and the access to more sophisticated Machine Learning models, we have revisited this feature to take it to the next level. Both us ecases take advantage of the latest frameworks and tools available on Databricks. With MLflow and Delta we have been able to find the best models that predict the destination for each individual driver, and to track each one of the KPIs.
2. Sergio Ballesteros, TomTom
Kia Eisinga, TomTom
Driver Location Intelligence at
Scale using Apache Spark, Delta
Lake and MLflow on Databricks
#UnifiedDataAnalytics #SparkAISummit
12. In dash systems are outperformed by smartphones
The embedded systemis expected to be up-to-date, with no user interaction. And the most visible component of it is a
map.
Usecase1:IQMapsanalytics
12
16. 98% OF TRIPS ARE DRIVEN WITHIN150KM RADIUS99.8% OF TRIPS ARE DRIVEN WITHIN1000KM RADIUS
16
17. Whenradiusis0km
• User drives within 2 regions every week day
• Radius of 0 km.
• Download and install justhome regions
• Cellular data usage kept to a minimum
17
18. Whenradiusis150km
• User drives within 2 update regions every
week day
• Radius of 150 km.
• Home region: 6 update regions.
• Cellular data usage increased
18
21. Realresultsusing0.5Mtrips
21
“This insight has led me to the conclusion
that a default radius of 150km is
unnecessary, and a small radius of ~10km
would already satisfy mostdrivers while
keeping cellular data usage low for OEMs.”
- Rolf Dorland, PM at TomTom
22. Goingonholidays
• User goes for his holiday (less frequent
updated region)
• Once user starts driving, updates for all
update regions the route goes through are
downloaded and installed.
22
26. Data
26
Original trace data from 1 source
227K device serials
Filtering out invalid trips
143K device serials
Users with at least 50 trips
3.6K device serials
Devices feasible for modelling
2.5K device serials
27. Features
For each trip, we have the following information:
• Where did the trip start?
• At what speed were you driving when the trip started?
• What was the time of day (morning/afternoon/evening) when the trip started?
• Was it rush hour when the trip started?
• What day of the week was it?
• Was it a weekend day?
• What was the season?
• Which driver profile do you belong to?
Historical information:
• Which destination did you go to your last trip? And the one before that? And the one before that?
• If it is a, let's say Monday, where did you go to the last Monday you made a trip? (do this for every weekday)
To predict: To which destination are you going?
What do we use in the end?
27
28. Labels
• We are given the latitude and longitude of a destination
of a trip.
• In order to find out which latitude and longitudes belong
to the same destination, we apply a clustering algorithm
called DBSCAN.
• DBSCAN clusters together destinations that are within
500 meters from each other. We should have at least 5
trips to a destination in order to call it a cluster.
How do we define where you are going?
28