SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Geospatial Analytics and
Spatial Capabilities on Big
Data Systems
By: Ahmed Jawad (PhD)
Agenda
2
• Why Analyze Telematics
• Analysis of Movement Data
• Analytical Assets for Telematics
• Operational Requirements on Telematics
• Data flow on big data platforms
• Analytical Challenges and Applications Solved through
Machine Learning
• Snap to road
• Unifying trajectories to patterns of movements and routines
• Traffic event detection
Why Analyze Telematics
• We are being recorded everywhere
• Provides great insights into the customer routines and
movement
• Key players competing in the market
3
Analyzing Movement Data
Trajectory
4
Object in motion (time –
space)
Coordinate based
recording
Raw trajectories
Symbolic trajectories
Discretization
Streets, locations, or
events
Traditional Operational Requirements In The World Of
Geographic Information Systems (GIS)
• Traditional use cases : cartography, geo-algebra (display of statistical
events, hotspots, co-locations on the map)
• Databases used : postgres, sql server
• Mostly static data sources
• Relatively small data sets
• Moderate geometric accuracy
• Offline processing acceptable
• Complex geometric datatypes support
Operational Requirements and
Design Considerations for
Telematics
• Realtime ingestion and analytics on sensor data, distance queries,
snap-to-road
• 100 TBs/ Petabyte scale of the data
• High variation in geospatial queries (range queries, etc..) and
throughtput of CRUD operations: insertion/deletion/read
• Processing flow and map applications, nature of the relationships in the
data implicating storage technology. Indexing techniques and
implications.
Telematics and Geospatial Data
Types
• Spatial data structures:
• Raster: geographically-referenced matrix of uniform size
• Vector: features on the earth’s surface are represented as
geographically-referenced vector objects
• Hierarchical nature of objects
• Points: different types : Entity, label, area, node
• Lines: lines, polylines, arc, link, etc.
• Polygons: area, polygon, complex polygon
• Requirements: The ability to manipulate Geospatial Data.
• Databases and libraries required to manipulate these objects on
distributed scale ( Spark and scala, MongoDB, or any other nosql
data base)
Analytical Assests for Telematics
• The analytical assets for Telematics can be broadly related to
• Snap-to-road
• Analysis of User Activities (Clustering)
• Traffic Event Detection (Classification)
• Realtime location search
• Set operations on geometriy objects and geoalgebra (layering of
geospatial information atop each other and algebraic operations on
them)
Conceptual dataflow and geospatial processing in Telematics
9
PDA
Event capture
Kafka
Event Processing & Delivery Descision
Stream Processing Engine
PDA Geodata & Critical events
Mongo / Hbase , Cassandra
/ Elastic
(on top of Hadoop)
Persistence Layer
Risk area
Tomcat App (Optional
Raster Processing - Geotrellis)
Datafeed client
Preload risk area
Preload traffic info
Client
D3 / Ajax /
Leaflet
API Push(REST)
Push
Websocket
Push
Pull
Push
Stream
Pull
Persistent layer should be scalable & support storage and querying of spatiotemporal objects (point, polygons, lines, line strings, for reference see mongo db’s 2d spherical indexing and geospatial
querying). The following low level queries shall be supported. (1) nearest neighbor query: given a point (lat, long) find all the line strings that are within x meter radius. (2) containment query: give all
the points within a polygon, or given a point find al the polygons containing them .
Client browser. e.g. fleet manager. In the
current scheme, we have deferred all the
intelligence to the client. i.e. the raster
processing, displaying the map, and
different layers along with map algebra will
be done on the client side. One such
example can be leaflet. An alternate
strategy can be to use geotrellis.io as a geo
processing engine to do the raster
operations and only use client for the display
of the map.
Stream processing queries (1)Instantaneous speed/
angular momentum of the PDA. (2) Distance to a
traffic event pulled from bing (3) Running
aggregates, e.g. how long the vehicle has spent at
the current location
Geocoding Service
OSM / Realtime traffic API
Analytics Cluster GIS capablities
Client browser. e.g. fleet manager. In the
current scheme, we have deferred all the
intelligence to the client. i.e. the raster
processing, displaying the map, and
different layers along with map algebra will
be done on the client side. One such
example can be leaflet. An alternate
strategy can be to use geotrellis.io as a geo
processing engine to do the raster
operations and only use client for the display
of the map. Hadoop Cluster
NoSql Database
Mongo DB
/ Hbase/ Elastic
Data Storage
Provisioning Layer
Spark
Scala +
R Studio
Server &
RMR
Processing Layer
Data Storage - Persistence layer
Name Index strategy geometry Query types Ease of
use/integration
Scalability/
Speed
Comments
Elastic search Geohash Point Bbox, Radius Good 3 stars 10s of TBs, Average writes, reads
and search extremely
fast
Neo4j Rtree Point/Line/
Polygon
Bbox, Radius Moderately Good 2
stars
10s of TBs Too much Granular
Hbase Buily your own
index
- - Moderately Good 2
stars
Petabytes Writes are fast, reads
as well, needs
specialization
Cassandra Build your own
index
- - Good , 3 stars Petabytes Same as HBase
Mongo db/ couch
base
geohash Point /line
/polygon
1) geo-within
2) Near
3) intersect
Excellent, 5 stars
Geojson / leaflet/
osm
10s of TBs,
Average
throughput
Best Integration with
geojson in all cases
Proposed Solutions: Short term : Mongo DB
Long term: Elastic search as the indexing engine and Hbase/ Cassandra as the storage
technology on top of hadoop
Analytical Services on Telematics
Cluster
1) Geocoding and reverse geocoding service on the
cluster
2) Weather and traffic Api (real time and history) to
support the use cases related to weather and traffic
related analytics
3) Street maps ( open street map in the start and then
some better map providers in the longer run)
• Required for the following analytics: regular trips , snap to
road, Mode of transport, Identification of risky roads, Impact of
POI (e.g. school) on events , enables Location based
Analytical Operations/Procedures Useful For Spatial Analysis
(R Studio Server With R Packages)
•Having an R studio Server on the cluster would be useful.
•Github Repository (already established)
•R packages for dealing with vector data (rgdal, rgeos, geojson_io,
SpatialTransforms)
• Point pattern analysis – dbscan, glm, gbm
• Describing and Analyzing Fields , Statistical Analysis of
Fields/Spatial Interpolation-krigging, tps
• Network Analysis, snap –to-road, frequent routes, etc..
(igraph, sna)
• Visualization of the data – leaflet, shiny
Geospatial processing layer on top
of persistence
• The Geospatial Processing layer that performs the
integration of map geometry and algebra to display the
information on map. On a small scale, can be performed
via java script (leaflet / d3)
• The following operations are required
1) Vector Operations
2) Map Algebra
• On larger scale, a software engineering layer for
distributed geospatial processing , for example, Scala,
Spark and Geotrellis is required.
• http://www.google-
melange.com/gsoc/proposal/public/google/gsoc2015/allixender/5676830073
Analytical Challenges in Movement Data
• Basic challenges in movement data
• Matching (Snap-to-road, street network matching)
• Similarity measures
• Trajectory clustering
• Event detection (classification)
15
Example Applications Solved through Machine Learning
• For raw trajectories
• Snap-to-road
• For symbolic trajectories
• Analysis of user activities
• Traffic event detection
16
Snap-to-road
• Given a trajectory T and a street network G
• Find a path in G that matches T with its real or ground
truth path
17
Snap-to-road: Analytical Modeling
• Multiregression view:
• Task = estimate noise free function f from T
that preserves the structural information
• Preserving structural correlations in output:
• Try kernelized embedding with kernel for raw trajectories
•
18
Snap-to-road• An important problem in organizations like Here, IBM and
Microsoft.
• Error between 10-100 meters (Wifi, Vehicle Navigation,
Mobile Devices)
• Sampling rate deteriorated and sparse GPS data
• Difficult at roundabouts, and tunnels
19
Solution:
 Basic steps:
 Embed the trajectory by Kernel Methods but
ignore map constraints
 Benefits:
 Noise reduction
 Capture multi-output, non-linear
dependencies
 ‘Round’ the resulting ‘relaxed
assignment’ to street map
20
Snap-to-road Algorithm
21
Snap-to-road:
Does it Work?
• Performance over challenging real tasks
22
Grouping Of Trajectories/Stops In Similar Routines
Basically Requires similarity measures for trajectories.
Unroll a trajectory by defining a mapping
23
Similarity Measures For Trajectories -- Symbolic Trajectories
• Formed by discretization of the curve through
measurement process or algorithms.
• Snap-to-road
• Stay points
• Regional division
24
Clustering of Staypoints to find Homezones
25
Grouping Of Trajectories/Stops In Similar Routines
Applications for Symbolic Trajectories Clustering and Event
Detection
• Trajectory clustering
• User activity analysis
• Traffic event detection
• Classification of events from non-event data
• Rerouting of traffic during baseball games
• Detection of conference in auditoriums
26
Applications for Symbolic Trajectories
• Exploit sequence analysis (in particular biological
sequence analysis)
1. Discretize the raw trajectories with an appropriate alphabet
2. Use alignment kernel with traffic symbol similairty in order to
translate traffic invariances to biological domain
3. Exploit sequence analysis to find discrete sequential patterns
(Where Traffic Meets DNA, Best Poster Award, ACM GIS
2011, Ahmed Jawad)
27
Trajectory Clustering
28
http://iapg.jade-hs.de/personen/brinkhoff/generator/ X
Time
24:00
20:00
16:00
12:00
8:00
4:00
0:00
Y
Home
Work
Sports
Trajectory Clustering :
Analysis of User Activities
• Analysis of user activities
• Frequent routes in trajectories
• Clustering at map matched Level
• Frequent routines in trajectories
• Clustering at stay point level
• Visualization of variability in routines (sequence logos)
29
Trajectory Clustering:
Map Matched Discretization
30
Trajectory Clustering:
Comparison to State-of-the-Art
31
Trajectory Clustering:
Routine analysis
32
Application for Symbolic Trajectories:
Traffic Event Detection
 Using biological sequence methods to model event persistence
• Analysis of Dodger’s baseball games from highway sensor
data
• Detecting Presence of Baseball Game
• Visualization
• Analysis of events at Caltech auditorium Entrance
• Detecting conferences in the auditorium
33
Traffic Event Detection
• Normalization based classifier
34
Readings from a taffic sensor
Traffic Event Detection:
Sequence Analysis
35
Summary and Conclusions
• Structural information analysis is the connection
between machine learning and GIS
• Still, a lot of data engineering and task specific tricks
needed, e.g., regularization, and normalization
36
Active Directions being pursued
• In Snap-to-road
• Fisher kernels for Sparse GPS data
• Testing KMM with real world system
• In clustering and event detection
• User profiles and diaries
• Label sequence graph kernels
• In structural information
• Can doing away the latitude/longitude pairs and keeping only
the structural information help with privacy issues
37
Q & A
References (1)
• Thomas Brinkhoff, Generating Network-Based Moving Objects, Proceedings of the 12th International Conference on Scientific and
Statistical Database Management, p.253, July 26-28, 2000
• C. Körner, M. May, S. Wrobel. Spatiotemporal Modeling and Analysis - Introduction and Overview. KI, 2012.
• Yi Guo , Junbin Gao , Paul W. Kwan, Twin Kernel Embedding, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.30
n.8, p.1490-1495, August 2008  
• Julian J. McAuley, Teofilo de Campos, and Tiberio S. Caetano. Unified graph matching in euclidean spaces. In CVPR, 2010.
• Tom Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009.
• Paul Newson , John Krumm, Hidden Markov Snap-to-road through noise and sparseness, Proceedings of the 17th ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems, November 04-06, 2009, Seattle, Washington  
• Novi Quadrianto, Le Song, and Alex Smola. Kernelized sorring. In NIPS 21, pages 1289--1296. 2009.
• Mohammed A. Quddus, Washington Y. Ochieng, and Robert B. Noland. Current map-matching algorithms for transport
applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, 15(5):312--
328, 2007.
• A. Abbott. A primer on sequence methods. Organization Science, 1(4):375--392, 1990.
• Gennady Andrienko , Natalia Andrienko , Stefan Wrobel, Visual analytics tools for analysis of movement data, ACM SIGKDD
Explorations Newsletter, v.9 n.2, December 2007  
• Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , Jörg Sander, OPTICS: ordering points to identify the clustering structure,
Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999,
Philadelphia, Pennsylvania, United States  
• Gerben de Vries , Maarten van Someren, Clustering vessel trajectories with alignment kernels under trajectory compression,
Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I, September 20-
24, 2010, Barcelona, Spain
• R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, 1998.
• M. Ester, H. P. Kriegel, S. Jörg, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise.
In KDD, pages 226--231, 1996.
39
References (2)
• Alexander Ihler , Jon Hutchins , Padhraic Smyth, Adaptive event detection with time-varying poisson processes, Proceedings of
the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia,
PA, USA
• Ahmed Jawad , Kristian Kersting, Kernelized Snap-to-road, Proceedings of the 18th SIGSPATIAL International Conference on
Advances in Geographic Information Systems, November 02-05, 2010, San Jose, California
• C. Joh, T. A. Arentze, and H. J. P. Timmermans. Multidimensional sequence alignment methods for activity-travel pattern
analysis: A comparison of dynamic programming and genetic algorithms. Geographical Analysis, 33(3):247--270, 2001.
• John A. Lee , Michel Verleysen, Nonlinear Dimensionality Reduction, Springer Publishing Company, Incorporated, 2007
• Yanchi Liu , Zhongmou Li , Hui Xiong , Xuedong Gao , Junjie Wu, Understanding of Internal Clustering Validation Measures,
Proceedings of the 2010 IEEE International Conference on Data Mining, p.911-916, December 13-17, 2010
• T. Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009.
• Salvatore Rinzivillo , Dino Pedreschi , Mirco Nanni , Fosca Giannotti , Natalia Andrienko , Gennady Andrienko, Visually driven
analysis of movement data by progressive clustering, Information Visualization, v.7 n.3, p.225-239, June 2008
• Albrecht Schmidt , Marc Langheinrich , Kritian Kersting, Perception beyond the Here and Now, Computer, v.44 n.2, p.86-88,
February 2011  
• S. Schonfelder and K. W. Axhausen. Urban Rhythms and Travel Behavior: Spatial and Temporal Phenomena of Daily Travel
(Transport and Society). Ashgate, 2010.
• N. Shoval and M. Isaacson. Sequence alignment as a method for human activity analysis in space and time. Annals of the
Association of American Geographers, 97(2):282--297, 2007.
• C. Wilson. Analysis of travel behavior using sequence alignment methods. Journal of the Transportation Research Board, 1645(-
1):52--59, 1998.
40
References (3)
• T. Gärtner. Kernels for structured data. World Scientific, Hackensack, N.J., 2008.
• T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and ecient alternatives. In Proceedings of
Conference on Learning Theory (COLT), pages 129---143, 2003.
• T. Gärtner, T. Horvath, Q. V. Le, A. J. Smola, and S.Wrobel. Kernel methods for graphs. In Mining Graph Data, pages
253--282. John Wiley and Sons, Inc,2006.
• Intelligence (PAMI), 31(5):944{952, 2009.
• R. O. Duda, D. G. Stork, and P. E. Hart. Pattern classification. Wiley, New York; Chichester, 2nd edition, 2000.
• R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological SequenceAnalysis. Cambridge University Press, 1998.
• M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial
databases with noise. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining
(SIGKDD), pages 226{231, 1996.
• D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello. Bayesian ltering for location estimation. IEEE Pervasive
Computing, 2(3):24--33, 2003.
• S. J. Ganey, A. W. Robertson, P. Smyth, S. J. Camargo, and M. Ghil. Probabilistic clustering of extratropical cyclones
using regression mixture models. Climate Dynamics, 29(4):423--440, 2006.
• M. Gariel, A. N. Srivastava, and E. Feron. Trajectory clustering and an application to airspace monitoring. IEEE
Transactions on Intelligent Transportation Systems (TITS), 12(4):1511--1524, 2006.
41
Appendix: persistence options
• Neo4j Spatial :
• Utilities for importing from ESRI Shapefile as well as Open Street Map
files
• Support for all the common geometry types
• An RTree index for fast searches on geometries
• Support for topology operations during the search (contains, within,
intersects, covers, disjoint, etc.)
• The possibility to enable spatial operations on any graph of data,
regardless of the way the spatial data is stored, as long as an adapter is
provided to map from the graph to the geometries.
• Ability to split a single layer or dataset into multiple sub-layers or views
with pre-configured filters
Appendix: persistence options
Hbase/Cassandra - Build your own index .
• Perform Geohashing yourself or use elastic
search as a hashing / search engine
• Libraries Available, to connect ES with
cassandra /Hbase
• Besides geohashing is easy to program
• http://thenewstack.io/building-streaming-data-
hub-elasticsearch-kafka-cassandra/
Appendix: persistence options
Mongodb Geospatial
• Store your location data as GeoJSON objects with this
coordinate-axis order: longitude, latitude. The
coordinate reference system for GeoJSON uses the
WGS84 datum.
Mongodb: Querying Datadb.<collection>.find( { <location field> :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ <coordinates> ]
} } } } )
db.places.find( { loc :
{ $geoWithin :
{ $geometry :
{ type : "Polygon" ,
coordinates : [ [[ 0 , 0 ] ,[ 3 , 6 ] ,[ 6 , 1 ] ,
[ 0 , 0 ]] ]} } } } )

Contenu connexe

Tendances

QGIS server: the good, the not-so-good and the ugly
QGIS server: the good, the not-so-good and the uglyQGIS server: the good, the not-so-good and the ugly
QGIS server: the good, the not-so-good and the uglyRoss McDonald
 
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)Prachi Mehta
 
[공간정보연구원] 1일차 - QGIS 개요 및 기초
[공간정보연구원] 1일차 - QGIS 개요 및 기초[공간정보연구원] 1일차 - QGIS 개요 및 기초
[공간정보연구원] 1일차 - QGIS 개요 및 기초slhead1
 
QGIS 고급 및 PyQGIS - 김기웅, 임영현
QGIS 고급 및 PyQGIS - 김기웅, 임영현 QGIS 고급 및 PyQGIS - 김기웅, 임영현
QGIS 고급 및 PyQGIS - 김기웅, 임영현 SANGHEE SHIN
 
Neptune presentation
Neptune presentationNeptune presentation
Neptune presentationktuttle34
 
Cartographie et SIG 2016 - Partie 3
Cartographie et SIG 2016 - Partie 3Cartographie et SIG 2016 - Partie 3
Cartographie et SIG 2016 - Partie 3Ibrahima Sylla
 
Unidad didáctica el verano
Unidad didáctica el veranoUnidad didáctica el verano
Unidad didáctica el veranoSubgrupo
 
Els Planetes
Els PlanetesEls Planetes
Els Planetesmguinoa
 
Cartographie et SIG_Partie4
Cartographie et SIG_Partie4Cartographie et SIG_Partie4
Cartographie et SIG_Partie4Ibrahima Sylla
 

Tendances (10)

QGIS server: the good, the not-so-good and the ugly
QGIS server: the good, the not-so-good and the uglyQGIS server: the good, the not-so-good and the ugly
QGIS server: the good, the not-so-good and the ugly
 
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)
Hyderabad LISS III Image Interpretation (Using ERDAS Imagine)
 
[공간정보연구원] 1일차 - QGIS 개요 및 기초
[공간정보연구원] 1일차 - QGIS 개요 및 기초[공간정보연구원] 1일차 - QGIS 개요 및 기초
[공간정보연구원] 1일차 - QGIS 개요 및 기초
 
QGIS 고급 및 PyQGIS - 김기웅, 임영현
QGIS 고급 및 PyQGIS - 김기웅, 임영현 QGIS 고급 및 PyQGIS - 김기웅, 임영현
QGIS 고급 및 PyQGIS - 김기웅, 임영현
 
Space Geodesy
Space GeodesySpace Geodesy
Space Geodesy
 
Neptune presentation
Neptune presentationNeptune presentation
Neptune presentation
 
Cartographie et SIG 2016 - Partie 3
Cartographie et SIG 2016 - Partie 3Cartographie et SIG 2016 - Partie 3
Cartographie et SIG 2016 - Partie 3
 
Unidad didáctica el verano
Unidad didáctica el veranoUnidad didáctica el verano
Unidad didáctica el verano
 
Els Planetes
Els PlanetesEls Planetes
Els Planetes
 
Cartographie et SIG_Partie4
Cartographie et SIG_Partie4Cartographie et SIG_Partie4
Cartographie et SIG_Partie4
 

En vedette

The Use of GIS in Local Government - The City of Monash
The Use of GIS in Local Government - The City of MonashThe Use of GIS in Local Government - The City of Monash
The Use of GIS in Local Government - The City of MonashSteven Truman
 
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...Beniamino Murgante
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationJoyabrata Das
 
portfolio-Qiao
portfolio-Qiaoportfolio-Qiao
portfolio-Qiaozhang qiao
 
The Great Olympic Lip Sync
The Great Olympic Lip SyncThe Great Olympic Lip Sync
The Great Olympic Lip Synccoolstuff
 
Program wcci-final[1]
Program wcci-final[1]Program wcci-final[1]
Program wcci-final[1]TARKI AOMAR
 
Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequimflavia_rodrigues
 
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Hakka Labs
 
JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintAmy Jo Reimer-Myers
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Shaida Darian
 
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]Randy Ikas
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleumAlex Thompson
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileDoc Norton
 
Solar Pump Applications in South asia
Solar Pump Applications in South asiaSolar Pump Applications in South asia
Solar Pump Applications in South asiatiger power yan
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiModern Data Stack France
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 

En vedette (20)

The Use of GIS in Local Government - The City of Monash
The Use of GIS in Local Government - The City of MonashThe Use of GIS in Local Government - The City of Monash
The Use of GIS in Local Government - The City of Monash
 
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...
An Open Source GIS System for Earthquake Early Warning and Post-Event Emergen...
 
Hadoop+Cassandra_Integration
Hadoop+Cassandra_IntegrationHadoop+Cassandra_Integration
Hadoop+Cassandra_Integration
 
Senbud 1
Senbud 1Senbud 1
Senbud 1
 
portfolio-Qiao
portfolio-Qiaoportfolio-Qiao
portfolio-Qiao
 
The Great Olympic Lip Sync
The Great Olympic Lip SyncThe Great Olympic Lip Sync
The Great Olympic Lip Sync
 
SW 04-27 Final presentation
SW 04-27 Final presentationSW 04-27 Final presentation
SW 04-27 Final presentation
 
Program wcci-final[1]
Program wcci-final[1]Program wcci-final[1]
Program wcci-final[1]
 
Dez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de PequimDez acontecimentos mais inusitados de Pequim
Dez acontecimentos mais inusitados de Pequim
 
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
Intuidex - To be or not to be iid by William M. Pottenger (NYC Machine Learni...
 
JPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrintJPEC 2014 AnnualReport-HR-ToPrint
JPEC 2014 AnnualReport-HR-ToPrint
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)Annual-Report-and-Research-in-Progress-2014-(lr)
Annual-Report-and-Research-in-Progress-2014-(lr)
 
Contoh ragam musik
Contoh ragam musikContoh ragam musik
Contoh ragam musik
 
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
K10 bs khonghucu_sma kelas x kurikulum 2013_[blogerkupang.com]
 
Ancient china qin dynasty, the great wall, mauseleum
Ancient china   qin dynasty, the great wall, mauseleumAncient china   qin dynasty, the great wall, mauseleum
Ancient china qin dynasty, the great wall, mauseleum
 
Switching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to AgileSwitching horses midstream - From Waterfall to Agile
Switching horses midstream - From Waterfall to Agile
 
Solar Pump Applications in South asia
Solar Pump Applications in South asiaSolar Pump Applications in South asia
Solar Pump Applications in South asia
 
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr KołaczkowskiCassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
Cassandra Hadoop Integration at HUG France by Piotr Kołaczkowski
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 

Similaire à Geospatial Analytics and Spatial Capabilities on Big Data Systems

Optimizing GIS based Systems
Optimizing GIS based SystemsOptimizing GIS based Systems
Optimizing GIS based SystemsAjinkya Deshpande
 
Geographic information system
Geographic information systemGeographic information system
Geographic information systemSumanta Das
 
Geographical information system
Geographical information systemGeographical information system
Geographical information systemBipin Karki
 
Geographical information system
Geographical information systemGeographical information system
Geographical information systemBipin Karki
 
What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)John Lanser
 
Spar 2010 Presetation
Spar 2010 PresetationSpar 2010 Presetation
Spar 2010 Presetationcformeller
 
Intro To Geospatial
Intro To GeospatialIntro To Geospatial
Intro To Geospatialdanrickman
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1wang yaohui
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Networkivaderivader
 
Geographical information systems
Geographical information systemsGeographical information systems
Geographical information systemsGift Musanza
 
Trb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstTrb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstRobert Tung
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it
 
Materi Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdfMateri Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdfsakinatunnajmi
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph RevolutionInfiniteGraph
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit
 

Similaire à Geospatial Analytics and Spatial Capabilities on Big Data Systems (20)

Optimizing GIS based Systems
Optimizing GIS based SystemsOptimizing GIS based Systems
Optimizing GIS based Systems
 
design_doc
design_docdesign_doc
design_doc
 
Geographic information system
Geographic information systemGeographic information system
Geographic information system
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
 
Geographical information system
Geographical information systemGeographical information system
Geographical information system
 
What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)What is Geography Information Systems (GIS)
What is Geography Information Systems (GIS)
 
Spar 2010 Presetation
Spar 2010 PresetationSpar 2010 Presetation
Spar 2010 Presetation
 
Intro To Geospatial
Intro To GeospatialIntro To Geospatial
Intro To Geospatial
 
lecture03.ppt
lecture03.pptlecture03.ppt
lecture03.ppt
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
 
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion NetworkTraffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
Traffic Flow Forecasting with Spatial-Temporal Graph Diffusion Network
 
Geographical information systems
Geographical information systemsGeographical information systems
Geographical information systems
 
Data sources and input in GIS
Data  sources and input in GISData  sources and input in GIS
Data sources and input in GIS
 
Trb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rstTrb 2017 annual_conference_visualization_lightning_talk_rst
Trb 2017 annual_conference_visualization_lightning_talk_rst
 
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUNye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU
 
Materi Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdfMateri Geodatabase Management - Fellowship 2022.pdf
Materi Geodatabase Management - Fellowship 2022.pdf
 
201029 Joohee Kim
201029 Joohee Kim201029 Joohee Kim
201029 Joohee Kim
 
Making sense of the Graph Revolution
Making sense of the Graph RevolutionMaking sense of the Graph Revolution
Making sense of the Graph Revolution
 
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram SriharshaMagellen: Geospatial Analytics on Spark by Ram Sriharsha
Magellen: Geospatial Analytics on Spark by Ram Sriharsha
 
Data Science At Zillow
Data Science At ZillowData Science At Zillow
Data Science At Zillow
 

Dernier

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Dernier (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

Geospatial Analytics and Spatial Capabilities on Big Data Systems

  • 1. Geospatial Analytics and Spatial Capabilities on Big Data Systems By: Ahmed Jawad (PhD)
  • 2. Agenda 2 • Why Analyze Telematics • Analysis of Movement Data • Analytical Assets for Telematics • Operational Requirements on Telematics • Data flow on big data platforms • Analytical Challenges and Applications Solved through Machine Learning • Snap to road • Unifying trajectories to patterns of movements and routines • Traffic event detection
  • 3. Why Analyze Telematics • We are being recorded everywhere • Provides great insights into the customer routines and movement • Key players competing in the market 3
  • 4. Analyzing Movement Data Trajectory 4 Object in motion (time – space) Coordinate based recording Raw trajectories Symbolic trajectories Discretization Streets, locations, or events
  • 5. Traditional Operational Requirements In The World Of Geographic Information Systems (GIS) • Traditional use cases : cartography, geo-algebra (display of statistical events, hotspots, co-locations on the map) • Databases used : postgres, sql server • Mostly static data sources • Relatively small data sets • Moderate geometric accuracy • Offline processing acceptable • Complex geometric datatypes support
  • 6. Operational Requirements and Design Considerations for Telematics • Realtime ingestion and analytics on sensor data, distance queries, snap-to-road • 100 TBs/ Petabyte scale of the data • High variation in geospatial queries (range queries, etc..) and throughtput of CRUD operations: insertion/deletion/read • Processing flow and map applications, nature of the relationships in the data implicating storage technology. Indexing techniques and implications.
  • 7. Telematics and Geospatial Data Types • Spatial data structures: • Raster: geographically-referenced matrix of uniform size • Vector: features on the earth’s surface are represented as geographically-referenced vector objects • Hierarchical nature of objects • Points: different types : Entity, label, area, node • Lines: lines, polylines, arc, link, etc. • Polygons: area, polygon, complex polygon • Requirements: The ability to manipulate Geospatial Data. • Databases and libraries required to manipulate these objects on distributed scale ( Spark and scala, MongoDB, or any other nosql data base)
  • 8. Analytical Assests for Telematics • The analytical assets for Telematics can be broadly related to • Snap-to-road • Analysis of User Activities (Clustering) • Traffic Event Detection (Classification) • Realtime location search • Set operations on geometriy objects and geoalgebra (layering of geospatial information atop each other and algebraic operations on them)
  • 9. Conceptual dataflow and geospatial processing in Telematics 9 PDA Event capture Kafka Event Processing & Delivery Descision Stream Processing Engine PDA Geodata & Critical events Mongo / Hbase , Cassandra / Elastic (on top of Hadoop) Persistence Layer Risk area Tomcat App (Optional Raster Processing - Geotrellis) Datafeed client Preload risk area Preload traffic info Client D3 / Ajax / Leaflet API Push(REST) Push Websocket Push Pull Push Stream Pull Persistent layer should be scalable & support storage and querying of spatiotemporal objects (point, polygons, lines, line strings, for reference see mongo db’s 2d spherical indexing and geospatial querying). The following low level queries shall be supported. (1) nearest neighbor query: given a point (lat, long) find all the line strings that are within x meter radius. (2) containment query: give all the points within a polygon, or given a point find al the polygons containing them . Client browser. e.g. fleet manager. In the current scheme, we have deferred all the intelligence to the client. i.e. the raster processing, displaying the map, and different layers along with map algebra will be done on the client side. One such example can be leaflet. An alternate strategy can be to use geotrellis.io as a geo processing engine to do the raster operations and only use client for the display of the map. Stream processing queries (1)Instantaneous speed/ angular momentum of the PDA. (2) Distance to a traffic event pulled from bing (3) Running aggregates, e.g. how long the vehicle has spent at the current location Geocoding Service OSM / Realtime traffic API
  • 10. Analytics Cluster GIS capablities Client browser. e.g. fleet manager. In the current scheme, we have deferred all the intelligence to the client. i.e. the raster processing, displaying the map, and different layers along with map algebra will be done on the client side. One such example can be leaflet. An alternate strategy can be to use geotrellis.io as a geo processing engine to do the raster operations and only use client for the display of the map. Hadoop Cluster NoSql Database Mongo DB / Hbase/ Elastic Data Storage Provisioning Layer Spark Scala + R Studio Server & RMR Processing Layer
  • 11. Data Storage - Persistence layer Name Index strategy geometry Query types Ease of use/integration Scalability/ Speed Comments Elastic search Geohash Point Bbox, Radius Good 3 stars 10s of TBs, Average writes, reads and search extremely fast Neo4j Rtree Point/Line/ Polygon Bbox, Radius Moderately Good 2 stars 10s of TBs Too much Granular Hbase Buily your own index - - Moderately Good 2 stars Petabytes Writes are fast, reads as well, needs specialization Cassandra Build your own index - - Good , 3 stars Petabytes Same as HBase Mongo db/ couch base geohash Point /line /polygon 1) geo-within 2) Near 3) intersect Excellent, 5 stars Geojson / leaflet/ osm 10s of TBs, Average throughput Best Integration with geojson in all cases Proposed Solutions: Short term : Mongo DB Long term: Elastic search as the indexing engine and Hbase/ Cassandra as the storage technology on top of hadoop
  • 12. Analytical Services on Telematics Cluster 1) Geocoding and reverse geocoding service on the cluster 2) Weather and traffic Api (real time and history) to support the use cases related to weather and traffic related analytics 3) Street maps ( open street map in the start and then some better map providers in the longer run) • Required for the following analytics: regular trips , snap to road, Mode of transport, Identification of risky roads, Impact of POI (e.g. school) on events , enables Location based
  • 13. Analytical Operations/Procedures Useful For Spatial Analysis (R Studio Server With R Packages) •Having an R studio Server on the cluster would be useful. •Github Repository (already established) •R packages for dealing with vector data (rgdal, rgeos, geojson_io, SpatialTransforms) • Point pattern analysis – dbscan, glm, gbm • Describing and Analyzing Fields , Statistical Analysis of Fields/Spatial Interpolation-krigging, tps • Network Analysis, snap –to-road, frequent routes, etc.. (igraph, sna) • Visualization of the data – leaflet, shiny
  • 14. Geospatial processing layer on top of persistence • The Geospatial Processing layer that performs the integration of map geometry and algebra to display the information on map. On a small scale, can be performed via java script (leaflet / d3) • The following operations are required 1) Vector Operations 2) Map Algebra • On larger scale, a software engineering layer for distributed geospatial processing , for example, Scala, Spark and Geotrellis is required. • http://www.google- melange.com/gsoc/proposal/public/google/gsoc2015/allixender/5676830073
  • 15. Analytical Challenges in Movement Data • Basic challenges in movement data • Matching (Snap-to-road, street network matching) • Similarity measures • Trajectory clustering • Event detection (classification) 15
  • 16. Example Applications Solved through Machine Learning • For raw trajectories • Snap-to-road • For symbolic trajectories • Analysis of user activities • Traffic event detection 16
  • 17. Snap-to-road • Given a trajectory T and a street network G • Find a path in G that matches T with its real or ground truth path 17
  • 18. Snap-to-road: Analytical Modeling • Multiregression view: • Task = estimate noise free function f from T that preserves the structural information • Preserving structural correlations in output: • Try kernelized embedding with kernel for raw trajectories • 18
  • 19. Snap-to-road• An important problem in organizations like Here, IBM and Microsoft. • Error between 10-100 meters (Wifi, Vehicle Navigation, Mobile Devices) • Sampling rate deteriorated and sparse GPS data • Difficult at roundabouts, and tunnels 19
  • 20. Solution:  Basic steps:  Embed the trajectory by Kernel Methods but ignore map constraints  Benefits:  Noise reduction  Capture multi-output, non-linear dependencies  ‘Round’ the resulting ‘relaxed assignment’ to street map 20
  • 22. Snap-to-road: Does it Work? • Performance over challenging real tasks 22
  • 23. Grouping Of Trajectories/Stops In Similar Routines Basically Requires similarity measures for trajectories. Unroll a trajectory by defining a mapping 23
  • 24. Similarity Measures For Trajectories -- Symbolic Trajectories • Formed by discretization of the curve through measurement process or algorithms. • Snap-to-road • Stay points • Regional division 24
  • 25. Clustering of Staypoints to find Homezones 25 Grouping Of Trajectories/Stops In Similar Routines
  • 26. Applications for Symbolic Trajectories Clustering and Event Detection • Trajectory clustering • User activity analysis • Traffic event detection • Classification of events from non-event data • Rerouting of traffic during baseball games • Detection of conference in auditoriums 26
  • 27. Applications for Symbolic Trajectories • Exploit sequence analysis (in particular biological sequence analysis) 1. Discretize the raw trajectories with an appropriate alphabet 2. Use alignment kernel with traffic symbol similairty in order to translate traffic invariances to biological domain 3. Exploit sequence analysis to find discrete sequential patterns (Where Traffic Meets DNA, Best Poster Award, ACM GIS 2011, Ahmed Jawad) 27
  • 29. Trajectory Clustering : Analysis of User Activities • Analysis of user activities • Frequent routes in trajectories • Clustering at map matched Level • Frequent routines in trajectories • Clustering at stay point level • Visualization of variability in routines (sequence logos) 29
  • 33. Application for Symbolic Trajectories: Traffic Event Detection  Using biological sequence methods to model event persistence • Analysis of Dodger’s baseball games from highway sensor data • Detecting Presence of Baseball Game • Visualization • Analysis of events at Caltech auditorium Entrance • Detecting conferences in the auditorium 33
  • 34. Traffic Event Detection • Normalization based classifier 34 Readings from a taffic sensor
  • 36. Summary and Conclusions • Structural information analysis is the connection between machine learning and GIS • Still, a lot of data engineering and task specific tricks needed, e.g., regularization, and normalization 36
  • 37. Active Directions being pursued • In Snap-to-road • Fisher kernels for Sparse GPS data • Testing KMM with real world system • In clustering and event detection • User profiles and diaries • Label sequence graph kernels • In structural information • Can doing away the latitude/longitude pairs and keeping only the structural information help with privacy issues 37
  • 38. Q & A
  • 39. References (1) • Thomas Brinkhoff, Generating Network-Based Moving Objects, Proceedings of the 12th International Conference on Scientific and Statistical Database Management, p.253, July 26-28, 2000 • C. Körner, M. May, S. Wrobel. Spatiotemporal Modeling and Analysis - Introduction and Overview. KI, 2012. • Yi Guo , Junbin Gao , Paul W. Kwan, Twin Kernel Embedding, IEEE Transactions on Pattern Analysis and Machine Intelligence, v.30 n.8, p.1490-1495, August 2008   • Julian J. McAuley, Teofilo de Campos, and Tiberio S. Caetano. Unified graph matching in euclidean spaces. In CVPR, 2010. • Tom Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009. • Paul Newson , John Krumm, Hidden Markov Snap-to-road through noise and sparseness, Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 04-06, 2009, Seattle, Washington   • Novi Quadrianto, Le Song, and Alex Smola. Kernelized sorring. In NIPS 21, pages 1289--1296. 2009. • Mohammed A. Quddus, Washington Y. Ochieng, and Robert B. Noland. Current map-matching algorithms for transport applications: State-of-the art and future research directions. Transportation Research Part C: Emerging Technologies, 15(5):312-- 328, 2007. • A. Abbott. A primer on sequence methods. Organization Science, 1(4):375--392, 1990. • Gennady Andrienko , Natalia Andrienko , Stefan Wrobel, Visual analytics tools for analysis of movement data, ACM SIGKDD Explorations Newsletter, v.9 n.2, December 2007   • Mihael Ankerst , Markus M. Breunig , Hans-Peter Kriegel , Jörg Sander, OPTICS: ordering points to identify the clustering structure, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.49-60, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States   • Gerben de Vries , Maarten van Someren, Clustering vessel trajectories with alignment kernels under trajectory compression, Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I, September 20- 24, 2010, Barcelona, Spain • R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis. Cambridge University Press, 1998. • M. Ester, H. P. Kriegel, S. Jörg, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD, pages 226--231, 1996. 39
  • 40. References (2) • Alexander Ihler , Jon Hutchins , Padhraic Smyth, Adaptive event detection with time-varying poisson processes, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, August 20-23, 2006, Philadelphia, PA, USA • Ahmed Jawad , Kristian Kersting, Kernelized Snap-to-road, Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, November 02-05, 2010, San Jose, California • C. Joh, T. A. Arentze, and H. J. P. Timmermans. Multidimensional sequence alignment methods for activity-travel pattern analysis: A comparison of dynamic programming and genetic algorithms. Geographical Analysis, 33(3):247--270, 2001. • John A. Lee , Michel Verleysen, Nonlinear Dimensionality Reduction, Springer Publishing Company, Incorporated, 2007 • Yanchi Liu , Zhongmou Li , Hui Xiong , Xuedong Gao , Junjie Wu, Understanding of Internal Clustering Validation Measures, Proceedings of the 2010 IEEE International Conference on Data Mining, p.911-916, December 13-17, 2010 • T. Mitchell. Mining our reality. Science, 326(5960):1644--1645, 2009. • Salvatore Rinzivillo , Dino Pedreschi , Mirco Nanni , Fosca Giannotti , Natalia Andrienko , Gennady Andrienko, Visually driven analysis of movement data by progressive clustering, Information Visualization, v.7 n.3, p.225-239, June 2008 • Albrecht Schmidt , Marc Langheinrich , Kritian Kersting, Perception beyond the Here and Now, Computer, v.44 n.2, p.86-88, February 2011   • S. Schonfelder and K. W. Axhausen. Urban Rhythms and Travel Behavior: Spatial and Temporal Phenomena of Daily Travel (Transport and Society). Ashgate, 2010. • N. Shoval and M. Isaacson. Sequence alignment as a method for human activity analysis in space and time. Annals of the Association of American Geographers, 97(2):282--297, 2007. • C. Wilson. Analysis of travel behavior using sequence alignment methods. Journal of the Transportation Research Board, 1645(- 1):52--59, 1998. 40
  • 41. References (3) • T. Gärtner. Kernels for structured data. World Scientific, Hackensack, N.J., 2008. • T. Gärtner, P. A. Flach, and S. Wrobel. On graph kernels: Hardness results and ecient alternatives. In Proceedings of Conference on Learning Theory (COLT), pages 129---143, 2003. • T. Gärtner, T. Horvath, Q. V. Le, A. J. Smola, and S.Wrobel. Kernel methods for graphs. In Mining Graph Data, pages 253--282. John Wiley and Sons, Inc,2006. • Intelligence (PAMI), 31(5):944{952, 2009. • R. O. Duda, D. G. Stork, and P. E. Hart. Pattern classification. Wiley, New York; Chichester, 2nd edition, 2000. • R. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison. Biological SequenceAnalysis. Cambridge University Press, 1998. • M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pages 226{231, 1996. • D. Fox, J. Hightower, L. Liao, D. Schulz, and G. Borriello. Bayesian ltering for location estimation. IEEE Pervasive Computing, 2(3):24--33, 2003. • S. J. Ganey, A. W. Robertson, P. Smyth, S. J. Camargo, and M. Ghil. Probabilistic clustering of extratropical cyclones using regression mixture models. Climate Dynamics, 29(4):423--440, 2006. • M. Gariel, A. N. Srivastava, and E. Feron. Trajectory clustering and an application to airspace monitoring. IEEE Transactions on Intelligent Transportation Systems (TITS), 12(4):1511--1524, 2006. 41
  • 42. Appendix: persistence options • Neo4j Spatial : • Utilities for importing from ESRI Shapefile as well as Open Street Map files • Support for all the common geometry types • An RTree index for fast searches on geometries • Support for topology operations during the search (contains, within, intersects, covers, disjoint, etc.) • The possibility to enable spatial operations on any graph of data, regardless of the way the spatial data is stored, as long as an adapter is provided to map from the graph to the geometries. • Ability to split a single layer or dataset into multiple sub-layers or views with pre-configured filters
  • 43. Appendix: persistence options Hbase/Cassandra - Build your own index . • Perform Geohashing yourself or use elastic search as a hashing / search engine • Libraries Available, to connect ES with cassandra /Hbase • Besides geohashing is easy to program • http://thenewstack.io/building-streaming-data- hub-elasticsearch-kafka-cassandra/
  • 44. Appendix: persistence options Mongodb Geospatial • Store your location data as GeoJSON objects with this coordinate-axis order: longitude, latitude. The coordinate reference system for GeoJSON uses the WGS84 datum.
  • 45. Mongodb: Querying Datadb.<collection>.find( { <location field> : { $geoWithin : { $geometry : { type : "Polygon" , coordinates : [ <coordinates> ] } } } } ) db.places.find( { loc : { $geoWithin : { $geometry : { type : "Polygon" , coordinates : [ [[ 0 , 0 ] ,[ 3 , 6 ] ,[ 6 , 1 ] , [ 0 , 0 ]] ]} } } } )