SlideShare une entreprise Scribd logo
1  sur  50
Geo Data Analytics
@dmarcous
● DBA (@IDF)
● Big Data Professional (@IDF)
● Data Wizard - Magic with Data (@Google - Waze)
● Pure professional
● Best practices
● Tools
● Tips & Tricks
● Free Advice!
Agenda
● Why?
● Common Language
● Problems at scale
● Solutions at scale
● Tips & Tricks for scientists
(/Wizards)
● Art
● Keep an eye out for…
● Dog Pictures
Why Does Geo Data Matter?
● C/C++, GEOS: http://trac.osgeo.org/geos
● C#, NTS: http://code.google.com/p/nettopologysuite/
● Java, JTS:
○ http://tsusiatsoftware.net/jts/main.html
○ http://www.vividsolutions.com/jts/JTSHome.htm
● Python, shapely: https://github.com/Toblerity/Shapely
● Ruby, ffi-geos: https://github.com/dark-panda/ffi-geos
● Javascript, JSTS: http://github.com/bjornharrtell/jsts
Geometry Object Model
Geospatial Operations
● WKT / WKB - Geospatial Markup Language
○ POLYGON((34.807841777801514 32.164333053441936,34.81168270111084
32.164859820966136,34.81337785720825 32.1613540349589,34.80865716934204
32.16046394346568,34.807841777801514 32.164333053441936))
○ http://arthur-e.github.io/Wicket/sandbox-gmaps3.html
● GeoJSON
○ { "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "Name": "Verint", "Guest":
"dmarcous", "Accomodations": "Beer; Pizza" }, "geometry": { "type": "Polygon", "coordinates": [ [
[ 34.807841777801514, 32.164333053441936 ], [ 34.81168270111084,
32.164859820966136 ], [ 34.81337785720825, 32.1613540349589 ], [
34.80865716934204, 32.16046394346568 ], [ 34.807841777801514,
32.164333053441936 ]]]}}]}
○ http://geojson.io/#map=17/32.16267/34.81061
● Shape Files - ESRI vector format
● GML - The Geography Markup Language (GML) is an XML grammar for expressing
geographical features.
● Raster - Display file built from coordinates
Formats
Databases
● RDBMS
○ Postgres (PostGIS)
○ MS-SQL / DB2 / Oracle
● NoSQL
○ MongoDB
○ IBM Cloudant
○ Lucene spatial module (elastic/ solr)
● Pure Geospatial Database
○ CartoDB (OS / Hosted)
○ GeoMesa (Accumulo)
■ GeoTrellis - Scala framework for processing raster data
GIS Systems
List of most popular ones -
http://en.wikipedia.org/wiki/List_of_geographic_information_systems_software
QGIS TileMillGRASS
Problem?
● Non scalar data types
○ Aggregating
○ Sharding
○ Unordered
● Speed & Accuracy
○ The Physical World is non-euclidian
http://www.jandrewrogers.com/2015/03/02/geospatial-
databases-are-hard/
Solution
Data Structures
● R-Tree (PostGIS, actually R+Tree)
● Quad Tree (DB2)
● Hyperdimensional Hashing
● Space Filling Curves
○ Z Order Curve (MS-SQL)
○ Hilbert Curve
The Curse of Dimensionality
Dimension Reduction
● GeoHash - The mainstream way
○ Linear (non tangant), up to x5 difference in cell area
○ Same Prefix - Close areas (sort of…)
○ http://geohash.org/
○ https://github.com/google/open-location-
code/blob/master/docs/comparison.adoc
● S2 - The google way
○ Quadratic, same level cell ~ similar area
○ Faces of a projected cube - divided by Quad-Trees to levels -
Referenced to position on face by a Hilbert Curve
○ https://code.google.com/p/s2-geometry-library/
● MongoDB Geospatial Indexing
● elastic / solr spatial indexing
● GeoMesa
● Build your own - Store the bytes in a fast
key-value store with reduced keys (HBase /
Cassandra)
Near Real Time Answers
● ESRI - Hive UDFs -
https://github.com/Esri/spatial-framework-for-
hadoop/wiki/UDF-Documentation
● Pigeon - Pig UDFs -
https://github.com/aseldawy/pigeon
● Spark -
○ SpatialSpark
○ GeoTrellis
Big Processing - It’s a UDF World
Graph Representation
● Use Cases
○ Routing
○ Supply Chains
○ Users Networks
● Tools
○ GraphX (Spark!) / Giraph (MR)
○ Dato SGraph (formerly known as GraphLab)
○ Gephi (On small parts for exploration)
● Algorithms
○ Shortest Path - Dijkstra / A-*
○ Communities - Triangle Counting
○ Importance - Centrality / Page Rank
Tips & Tricks
Approximation
Timezones
● tz_world
○ http://efele.net/maps/tz/world/
○ What do we do with shapefiles?
● APIs
○ Geonames
○ http://www.earthtools.org/
○ Google Timezone API
● UDFs?
○ Hive - from_utc_timestamp(timestamp, string timezone)
// Word Count
val textFile = spark.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
// Modified Word Count
val textFile = spark.textFile("hdfs://...")
val counts = textFile.map(line => line.split(","))
.map(point => (coord2S2Cell(point(1),point(2)), 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
// Take that from a library!
def coord2S2Cell(longitude: Double, latitude: Double, lvl = 14) : Int =
{
return S2Cell(longitude,latitude, lvl).CellId()
}
Good Old Word Count
Advanced - Precision is of the Essence
● Density Based Clustering
○ DBSCAN
■ Minimum cluster size (>
Noise)
■ Epsilon (Spatial Radius)
○ R - MASS - kde2d
■ RGoogleMaps for the map
■ http://www.everydayanalytics.ca
/2014/04/heatmap-of-toronto-
traffic-signals.html
rJava
● Wrap geospatial functions of your choice
● call them from R
● Use apply on an entire Dataframe!
● Use as features!
● Visualize??? (in 5 minutes)
R Packs for Geospatial Analysis
● geonames
○ Timezone
○ Weather
○ Nearby places
● RGoogleMaps
○ download+paint Maps
○ getGeoCode
● sp / maps / maptools
○ OGC object abstractions
○ Manipulate / display geo data
● rgdal - spTransform
○ Convert formats / coordinates systems
● geosphere - distances / circles / centroids
● fpc - DBSCAN
● Coverage -
○ http://cran.r-project.org/web/views/Spatial.html
Engineered Geo features
● LOCAL
○ time
○ is_early / is_late
○ day of week
○ is_workday / is_weekend
○ is_day_light (sunrise/ sunset tz_world)
● Weather
○ Temperature
○ is_ Rain/ Fog / Hail / Snow
● Squared (s2cell/ geohash) statistics
○ Probability of users in square to predict X
● Address - is_residence / is_business
● News - GDELT
WOW!
Data Art
Google Sheets
Frontend = Javascript?
● Google Maps API
○ https://developers.google.com/maps/documentation/javascript/examples/layer-
heatmap
● Leaflet
R for Visualisation
● ggplot2 + geospatial packs
○ http://uce.uniovi.es/mundor/howtoplotashapemap.html
○ http://stackoverflow.com/questions/9558040/ggplot-map-with-l
○ http://spatial.ly/2012/02/great-maps-ggplot2/
● RGoogleMaps
○ http://rforwork.info/tag/rgooglemaps/
R For Interactive
● Shiny
○ Leaflet
■ http://rstudio.github.io/leaflet/
■ http://shiny.rstudio.com/gallery/superzip-example.html
■ http://shiny.rstudio.com/gallery/bus-dashboard.html
○ Globe
■ https://github.com/trestletech/shinyGlobe
R Animation
● http://rmaps.github.io/blog/posts/animated-choropleths/
@aaronkoblin
Keep an Eye Out!
https://locationtech.org/list-of-projects
Contact
● Daniel Marcous
● dmarcous@gmail.com

Contenu connexe

Tendances

Using deep learning in remote sensing
Using deep learning in remote sensingUsing deep learning in remote sensing
Using deep learning in remote sensingMohamed Yousif
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
Surface Representations using GIS AND Topographical Mapping
Surface Representations using GIS AND Topographical MappingSurface Representations using GIS AND Topographical Mapping
Surface Representations using GIS AND Topographical MappingNAXA-Developers
 
Carmon remote sensinggis
Carmon remote sensinggisCarmon remote sensinggis
Carmon remote sensinggisnavdeepjamwal
 
Remote sensing concept and applications
Remote sensing   concept and applicationsRemote sensing   concept and applications
Remote sensing concept and applicationsMegha Majoe
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systemsVivek Srivastava
 
Chap1 introduction to geographic information system (gis)
Chap1 introduction to geographic information system (gis)Chap1 introduction to geographic information system (gis)
Chap1 introduction to geographic information system (gis)Mweemba Hachita
 
What is GIS
What is GISWhat is GIS
What is GISEsri
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GISKU Leuven
 
introduction to gis technology and its applications
  introduction to gis technology and its applications  introduction to gis technology and its applications
introduction to gis technology and its applicationsGhassan Hadi
 
Japanese GPS system/Quasi Zenith Satellite System
Japanese GPS system/Quasi Zenith Satellite SystemJapanese GPS system/Quasi Zenith Satellite System
Japanese GPS system/Quasi Zenith Satellite SystemVinesh Gowda
 
hyperspectral remote sensing and its geological applications
hyperspectral remote sensing and its geological applicationshyperspectral remote sensing and its geological applications
hyperspectral remote sensing and its geological applicationsabhijeet_banerjee
 
Introdution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineIntrodution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineVeerachai Tanpipat
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOCARTO
 
Demonstration of super map ai gis technology
Demonstration of super map ai gis technology  Demonstration of super map ai gis technology
Demonstration of super map ai gis technology GeoMedeelel
 

Tendances (20)

Using deep learning in remote sensing
Using deep learning in remote sensingUsing deep learning in remote sensing
Using deep learning in remote sensing
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Surface Representations using GIS AND Topographical Mapping
Surface Representations using GIS AND Topographical MappingSurface Representations using GIS AND Topographical Mapping
Surface Representations using GIS AND Topographical Mapping
 
Carmon remote sensinggis
Carmon remote sensinggisCarmon remote sensinggis
Carmon remote sensinggis
 
Remote sensing concept and applications
Remote sensing   concept and applicationsRemote sensing   concept and applications
Remote sensing concept and applications
 
Introduction to GIS systems
Introduction to GIS systemsIntroduction to GIS systems
Introduction to GIS systems
 
Chap1 introduction to geographic information system (gis)
Chap1 introduction to geographic information system (gis)Chap1 introduction to geographic information system (gis)
Chap1 introduction to geographic information system (gis)
 
What is GIS
What is GISWhat is GIS
What is GIS
 
Geodesy
GeodesyGeodesy
Geodesy
 
GIS Modeling
GIS ModelingGIS Modeling
GIS Modeling
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
introduction to gis technology and its applications
  introduction to gis technology and its applications  introduction to gis technology and its applications
introduction to gis technology and its applications
 
Japanese GPS system/Quasi Zenith Satellite System
Japanese GPS system/Quasi Zenith Satellite SystemJapanese GPS system/Quasi Zenith Satellite System
Japanese GPS system/Quasi Zenith Satellite System
 
Introduction to gis
Introduction to gisIntroduction to gis
Introduction to gis
 
ERDAS IMAGINE
ERDAS IMAGINEERDAS IMAGINE
ERDAS IMAGINE
 
GIS Geographical Information System
GIS Geographical Information SystemGIS Geographical Information System
GIS Geographical Information System
 
hyperspectral remote sensing and its geological applications
hyperspectral remote sensing and its geological applicationshyperspectral remote sensing and its geological applications
hyperspectral remote sensing and its geological applications
 
Introdution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth EngineIntrodution to Landsat and Google Earth Engine
Introdution to Landsat and Google Earth Engine
 
Scaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTOScaling Spatial Analytics with Google Cloud & CARTO
Scaling Spatial Analytics with Google Cloud & CARTO
 
Demonstration of super map ai gis technology
Demonstration of super map ai gis technology  Demonstration of super map ai gis technology
Demonstration of super map ai gis technology
 

Similaire à Geo data analytics

Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifyNeville Li
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMHolden Karau
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsHolden Karau
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelJavaDayUA
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBMongoDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamNeville Li
 
Large Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopLarge Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopChristoph Körner
 
A middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLA middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLLuiz Henrique Zambom Santana
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial WorldGIS in the Rockies
 
Thinking beyond RDBMS - Building Polyglot Persistence Java Applications Devf...
Thinking beyond RDBMS  - Building Polyglot Persistence Java Applications Devf...Thinking beyond RDBMS  - Building Polyglot Persistence Java Applications Devf...
Thinking beyond RDBMS - Building Polyglot Persistence Java Applications Devf...Shekhar Gulati
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDBArangoDB Database
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive Omid Vahdaty
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsCheng Lian
 
Google Farewell Competition.pptx
Google Farewell Competition.pptxGoogle Farewell Competition.pptx
Google Farewell Competition.pptxharrishadjiantonis1
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Hadoop in Data Warehousing
Hadoop in Data WarehousingHadoop in Data Warehousing
Hadoop in Data WarehousingAlexey Grigorev
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
Learning groovy -EU workshop
Learning groovy  -EU workshopLearning groovy  -EU workshop
Learning groovy -EU workshopadam1davis
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptbhargavi804095
 

Similaire à Geo data analytics (20)

Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
A super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAMA super fast introduction to Spark and glance at BEAM
A super fast introduction to Spark and glance at BEAM
 
A fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFsA fast introduction to PySpark with a quick look at Arrow based UDFs
A fast introduction to PySpark with a quick look at Arrow based UDFs
 
MapDB - taking Java collections to the next level
MapDB - taking Java collections to the next levelMapDB - taking Java collections to the next level
MapDB - taking Java collections to the next level
 
Getting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDBGetting Started with Geospatial Data in MongoDB
Getting Started with Geospatial Data in MongoDB
 
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache BeamScio - A Scala API for Google Cloud Dataflow & Apache Beam
Scio - A Scala API for Google Cloud Dataflow & Apache Beam
 
Large Scale Geo Processing on Hadoop
Large Scale Geo Processing on HadoopLarge Scale Geo Processing on Hadoop
Large Scale Geo Processing on Hadoop
 
A middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQLA middleware for storing massive RDF graphs into NoSQL
A middleware for storing massive RDF graphs into NoSQL
 
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World2017 RM-URISA Track:  Spatial SQL - The Best Kept Secret in the Geospatial World
2017 RM-URISA Track: Spatial SQL - The Best Kept Secret in the Geospatial World
 
Thinking beyond RDBMS - Building Polyglot Persistence Java Applications Devf...
Thinking beyond RDBMS  - Building Polyglot Persistence Java Applications Devf...Thinking beyond RDBMS  - Building Polyglot Persistence Java Applications Devf...
Thinking beyond RDBMS - Building Polyglot Persistence Java Applications Devf...
 
Graph Analytics with ArangoDB
Graph Analytics with ArangoDBGraph Analytics with ArangoDB
Graph Analytics with ArangoDB
 
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive AWS Big Data Demystified #2 |  Athena, Spectrum, Emr, Hive
AWS Big Data Demystified #2 | Athena, Spectrum, Emr, Hive
 
DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
Google Farewell Competition.pptx
Google Farewell Competition.pptxGoogle Farewell Competition.pptx
Google Farewell Competition.pptx
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Hadoop in Data Warehousing
Hadoop in Data WarehousingHadoop in Data Warehousing
Hadoop in Data Warehousing
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Learning groovy -EU workshop
Learning groovy  -EU workshopLearning groovy  -EU workshop
Learning groovy -EU workshop
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
Apache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.pptApache Spark™ is a multi-language engine for executing data-S5.ppt
Apache Spark™ is a multi-language engine for executing data-S5.ppt
 

Plus de Daniel Marcous

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILDaniel Marcous
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Daniel Marcous
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDaniel Marcous
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETADaniel Marcous
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Daniel Marcous
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleDaniel Marcous
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architecturesDaniel Marcous
 

Plus de Daniel Marcous (10)

Cloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle ILCloud AI Platform Notebooks - Kaggle IL
Cloud AI Platform Notebooks - Kaggle IL
 
S2
S2S2
S2
 
Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018Towards Smart Transportation DSS 2018
Towards Smart Transportation DSS 2018
 
Distributed Databases - Concepts & Architectures
Distributed Databases - Concepts & ArchitecturesDistributed Databases - Concepts & Architectures
Distributed Databases - Concepts & Architectures
 
Prediction of taxi rides ETA
Prediction of taxi rides ETAPrediction of taxi rides ETA
Prediction of taxi rides ETA
 
Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)Distributed K-Betweenness (Spark)
Distributed K-Betweenness (Spark)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Big Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @GoogleBig Data - Big Insights - Waze @Google
Big Data - Big Insights - Waze @Google
 
Big data real time architectures
Big data real time architecturesBig data real time architectures
Big data real time architectures
 
Data Visualisation
Data VisualisationData Visualisation
Data Visualisation
 

Dernier

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Dernier (20)

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Geo data analytics

  • 2. @dmarcous ● DBA (@IDF) ● Big Data Professional (@IDF) ● Data Wizard - Magic with Data (@Google - Waze)
  • 3. ● Pure professional ● Best practices ● Tools ● Tips & Tricks ● Free Advice!
  • 4. Agenda ● Why? ● Common Language ● Problems at scale ● Solutions at scale ● Tips & Tricks for scientists (/Wizards) ● Art ● Keep an eye out for… ● Dog Pictures
  • 5. Why Does Geo Data Matter?
  • 6.
  • 7.
  • 8. ● C/C++, GEOS: http://trac.osgeo.org/geos ● C#, NTS: http://code.google.com/p/nettopologysuite/ ● Java, JTS: ○ http://tsusiatsoftware.net/jts/main.html ○ http://www.vividsolutions.com/jts/JTSHome.htm ● Python, shapely: https://github.com/Toblerity/Shapely ● Ruby, ffi-geos: https://github.com/dark-panda/ffi-geos ● Javascript, JSTS: http://github.com/bjornharrtell/jsts
  • 11. ● WKT / WKB - Geospatial Markup Language ○ POLYGON((34.807841777801514 32.164333053441936,34.81168270111084 32.164859820966136,34.81337785720825 32.1613540349589,34.80865716934204 32.16046394346568,34.807841777801514 32.164333053441936)) ○ http://arthur-e.github.io/Wicket/sandbox-gmaps3.html ● GeoJSON ○ { "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "Name": "Verint", "Guest": "dmarcous", "Accomodations": "Beer; Pizza" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 34.807841777801514, 32.164333053441936 ], [ 34.81168270111084, 32.164859820966136 ], [ 34.81337785720825, 32.1613540349589 ], [ 34.80865716934204, 32.16046394346568 ], [ 34.807841777801514, 32.164333053441936 ]]]}}]} ○ http://geojson.io/#map=17/32.16267/34.81061 ● Shape Files - ESRI vector format ● GML - The Geography Markup Language (GML) is an XML grammar for expressing geographical features. ● Raster - Display file built from coordinates Formats
  • 12. Databases ● RDBMS ○ Postgres (PostGIS) ○ MS-SQL / DB2 / Oracle ● NoSQL ○ MongoDB ○ IBM Cloudant ○ Lucene spatial module (elastic/ solr) ● Pure Geospatial Database ○ CartoDB (OS / Hosted) ○ GeoMesa (Accumulo) ■ GeoTrellis - Scala framework for processing raster data
  • 13. GIS Systems List of most popular ones - http://en.wikipedia.org/wiki/List_of_geographic_information_systems_software QGIS TileMillGRASS
  • 14.
  • 15. Problem? ● Non scalar data types ○ Aggregating ○ Sharding ○ Unordered ● Speed & Accuracy ○ The Physical World is non-euclidian http://www.jandrewrogers.com/2015/03/02/geospatial- databases-are-hard/
  • 17. Data Structures ● R-Tree (PostGIS, actually R+Tree) ● Quad Tree (DB2) ● Hyperdimensional Hashing ● Space Filling Curves ○ Z Order Curve (MS-SQL) ○ Hilbert Curve
  • 18. The Curse of Dimensionality
  • 19. Dimension Reduction ● GeoHash - The mainstream way ○ Linear (non tangant), up to x5 difference in cell area ○ Same Prefix - Close areas (sort of…) ○ http://geohash.org/ ○ https://github.com/google/open-location- code/blob/master/docs/comparison.adoc ● S2 - The google way ○ Quadratic, same level cell ~ similar area ○ Faces of a projected cube - divided by Quad-Trees to levels - Referenced to position on face by a Hilbert Curve ○ https://code.google.com/p/s2-geometry-library/
  • 20. ● MongoDB Geospatial Indexing ● elastic / solr spatial indexing ● GeoMesa ● Build your own - Store the bytes in a fast key-value store with reduced keys (HBase / Cassandra) Near Real Time Answers
  • 21. ● ESRI - Hive UDFs - https://github.com/Esri/spatial-framework-for- hadoop/wiki/UDF-Documentation ● Pigeon - Pig UDFs - https://github.com/aseldawy/pigeon ● Spark - ○ SpatialSpark ○ GeoTrellis Big Processing - It’s a UDF World
  • 22. Graph Representation ● Use Cases ○ Routing ○ Supply Chains ○ Users Networks ● Tools ○ GraphX (Spark!) / Giraph (MR) ○ Dato SGraph (formerly known as GraphLab) ○ Gephi (On small parts for exploration) ● Algorithms ○ Shortest Path - Dijkstra / A-* ○ Communities - Triangle Counting ○ Importance - Centrality / Page Rank
  • 25.
  • 26. Timezones ● tz_world ○ http://efele.net/maps/tz/world/ ○ What do we do with shapefiles? ● APIs ○ Geonames ○ http://www.earthtools.org/ ○ Google Timezone API ● UDFs? ○ Hive - from_utc_timestamp(timestamp, string timezone)
  • 27.
  • 28. // Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Modified Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.map(line => line.split(",")) .map(point => (coord2S2Cell(point(1),point(2)), 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Take that from a library! def coord2S2Cell(longitude: Double, latitude: Double, lvl = 14) : Int = { return S2Cell(longitude,latitude, lvl).CellId() } Good Old Word Count
  • 29. Advanced - Precision is of the Essence ● Density Based Clustering ○ DBSCAN ■ Minimum cluster size (> Noise) ■ Epsilon (Spatial Radius) ○ R - MASS - kde2d ■ RGoogleMaps for the map ■ http://www.everydayanalytics.ca /2014/04/heatmap-of-toronto- traffic-signals.html
  • 30. rJava ● Wrap geospatial functions of your choice ● call them from R ● Use apply on an entire Dataframe! ● Use as features! ● Visualize??? (in 5 minutes)
  • 31. R Packs for Geospatial Analysis ● geonames ○ Timezone ○ Weather ○ Nearby places ● RGoogleMaps ○ download+paint Maps ○ getGeoCode ● sp / maps / maptools ○ OGC object abstractions ○ Manipulate / display geo data ● rgdal - spTransform ○ Convert formats / coordinates systems ● geosphere - distances / circles / centroids ● fpc - DBSCAN ● Coverage - ○ http://cran.r-project.org/web/views/Spatial.html
  • 32.
  • 33. Engineered Geo features ● LOCAL ○ time ○ is_early / is_late ○ day of week ○ is_workday / is_weekend ○ is_day_light (sunrise/ sunset tz_world) ● Weather ○ Temperature ○ is_ Rain/ Fog / Hail / Snow ● Squared (s2cell/ geohash) statistics ○ Probability of users in square to predict X ● Address - is_residence / is_business ● News - GDELT
  • 34. WOW!
  • 37. Frontend = Javascript? ● Google Maps API ○ https://developers.google.com/maps/documentation/javascript/examples/layer- heatmap ● Leaflet
  • 38. R for Visualisation ● ggplot2 + geospatial packs ○ http://uce.uniovi.es/mundor/howtoplotashapemap.html ○ http://stackoverflow.com/questions/9558040/ggplot-map-with-l ○ http://spatial.ly/2012/02/great-maps-ggplot2/ ● RGoogleMaps ○ http://rforwork.info/tag/rgooglemaps/
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. R For Interactive ● Shiny ○ Leaflet ■ http://rstudio.github.io/leaflet/ ■ http://shiny.rstudio.com/gallery/superzip-example.html ■ http://shiny.rstudio.com/gallery/bus-dashboard.html ○ Globe ■ https://github.com/trestletech/shinyGlobe
  • 44.
  • 47. Keep an Eye Out! https://locationtech.org/list-of-projects
  • 48.
  • 49.
  • 50. Contact ● Daniel Marcous ● dmarcous@gmail.com