SlideShare a Scribd company logo
1 of 30
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Spatial Analytics with Hive
Hive Meetup – July 24, 2013
@cshanklin
Page 1
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Why Spatial Analytics?
• Amount of spatial data has exploded due to mobile device
ubiquity and more reliance on sensors.
• Proliferation of consumer-oriented mapping products brings
spatial analytics to the mainstream.
Page 2
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
An Interesting Dataset
• GPS data collected from Uber trips.
• Anonymized, maintains days/times but not dates.
• Obtained from InfoChimps
Page 3
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Data Sample
Page 4
ID Date Time Latitude Longitude
1 1/7/07 10:54:50 37.782551 -122.445368
1 1/7/07 10:54:54 37.782745 -122.444586
1 1/7/07 10:54:58 37.782842 -122.443688
1 1/7/07 10:55:02 37.782919 -122.442815
1 1/7/07 10:55:06 37.782992 -122.442112
1 1/7/07 10:55:10 37.7831 -122.441461
1 1/7/07 10:55:14 37.783206 -122.440829
1 1/7/07 10:55:18 37.783273 -122.440324
Overall
1.1M distinct readings
25,000 distinct trips.
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Meanwhile, At Uber Headquarters…
Page 5
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Questions Uber Might Ask:
• What do trips tend to look like?
• How can we reduce wait time and make more trips?
• Are there new products we should introduce?
Page 6
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Answering The Questions
• Why Use SQL?
–Well understood by analysts.
–Huge ecosystem, access Hive from any of 20+ BI tools.
• Why Hive?
–Supports advanced SQL analytics like windowing functions.
–Java based, makes it easy for 3rd parties to add extensions.
• Last Reason
–This is the Hive meetup. Were you expecting ABAP?
Page 7
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Getting a feel for the trips.
Page 8
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Duration
• To get the duration all we need to do is:
–Subtract the last timestamp from the first timestamp.
–Do it per trip ID (1-25000).
• OK, how do we do it with SQL?
Page 9
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Getting First Or Last Values In A Partition
Page 10
-- Get the last observation from each trip ID.
-- Standard approach on any SQL system that supports windowing.
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY uber.id ORDER BY uber.dt DESC) as rn
FROM
uber
) sub1
WHERE
rn = 1;
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
And Hive Supports Windowing Now (0.11+)
Page 11
Name Purpose
CUME_DIST
Number of rows with values lower than (or greater than if ORDER
BY DESC) the current row.
DENSE_RANK
The dense rank of the row within the partition. If any rows “tie” or
have the same value, they receive the same rank. DENSE_RANK
does not have gaps in the ranks, in contrast to RANK.
FIRST_VALUE The value in the first row within the partition.
LAST_VALUE
Surprisingly, not the opposite of FIRST_VALUE (if you want that
just change your sort order.) LAST_VALUE is tricky, look it up.
LAG Value from a prior row in the partition.
LEAD Value from a subsequent row in the partition.
NTILE Divides rows in a partition into N many groups.
ROW_NUMBER The row number of the row within the partition.
RANK
The rank of the row within the partition. This differs from
ROW_NUMBER in that ties receive the same value.
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Compute Trip Durations
Page 12
-- Subtract the first timestamp from the last timestamp.
-- Use FIRST_VALUE and ROW_NUMBER to help compare first and last timestamps.
SELECT
id,
(unix_timestamp(dt) - unix_timestamp(fv)) as trip_duration
FROM (
SELECT
id, dt, fv
FROM (
SELECT
id, dt,
FIRST_VALUE(dt) OVER (PARTITION BY id ORDER BY dt) as fv,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt DESC) as lastrk
FROM
uber
) sub1
WHERE
lastrk = 1
) sub2;
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Trip Duration SQL Output
Page 13
id trip_duration
1 128
2 148
3 150
4 336
5 400
6 168
7 142
8 558
9 312
10 208
...
(25,000 total trips)
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Duration Was Easy, What About Distance?
• All we have is GPS readings.
• If we draw a line from GPS readings, it estimates trip distance.
• GPS readings are 4s apart, estimates should be close.
Page 14
Actual Route
GPS Signal
Estimated Route
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Enter GIS Tools for Hadoop
Page 15
esri.github.io/gis-tools-for-hadoop
Works with Hive and Map-Reduce
Syntax similar to other spatial systems like PostGIS
Open Source
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Spatial Framework for Hadoop Functions
Page 16
Name Purpose
ST_LineString Create a line from coordinates supplied in a string.
ST_Polygon Create a polygon.
ST_SetSRID Set Spatial Reference ID. SRID 4326 corresponds to WGS84.
ST_GeodesicLengthWGS84
Compute length of a line in meters assuming points use the
World Geodetic System 1984. GPS uses the WGS84
coordinate system.
ST_Length Compute Cartesian length.
ST_Contains
Determine if one spatial object contains another spatial
object.
ST_Intersects Determine if two spatial objects intersect.
ST_AsText
Return a text representation of a spatial object, suitable for
storing in a Hive string column. Objects can also be saved in
binary columns with no conversion.
82 total spatial functions provided by Spatial Framework for
Hadoop.
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
ST_LineString: Make a line.
• 2 Constructors
–ST_LineString(1, 1, 2, 2, 3, 3);
– Simple constructor.
–ST_LineString('linestring(1 1, 2 2, 3 3)');
– WKT or Well-Known-Text constructor.
• Neither approach very convenient for this dataset.
• Since SF4H is open-source I added a new constructor:
–ST_LineString([Array of ST_Point Objects]);
Page 17
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
collect_array: Custom UDAF turns columns to
arrays
Page 18
ID Date Time Latitude Longitude
1 1/7/07 10:54:50 37.782551 -122.445368
1 1/7/07 10:54:54 37.782745 -122.444586
1 1/7/07 10:54:58 37.782842 -122.443688
1 1/7/07 10:55:02 37.782919 -122.442815
1 1/7/07 10:55:06 37.782992 -122.442112
1 1/7/07 10:55:10 37.7831 -122.441461
1 1/7/07 10:55:14 37.783206 -122.440829
1 1/7/07 10:55:18 37.783273 -122.440324
> SELECT id, collect_array(latitude) FROM table GROUP BY id;
(1, [ 37.782551, 37.782745, 37.782842, 37.782919, 37.782992 ... ])
...
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Computing Trip Lengths Now Trivial
Page 19
-- Compute the trip lengths.
-- Our coordinates conform to WGS84, use that to compute distances.
-- ST_SetSRID(_, 4326) marks the object as conforming to WGS84.
-- Group by trip ID.
SELECT
id,
ST_GeodesicLengthWGS84(
ST_SetSRID(
ST_LineString(collect_array(point)), 4326)) as length
FROM (
SELECT
id,
ST_Point(longitude, latitude) as point
FROM
uber
) sub
GROUP BY
id;
Generate an ST_Point for each row
Group the points, turn them into arrays
and make a line out of it.
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Demo
Computing Trip Distances in Hortonworks Sandbox
Page 20
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Visualizing Trip Times and Durations
Page 21
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Time For a New Product?
• How Likely is Demand for an SFO Rideshare?
• How many trips even go to SFO?
Page 22
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
ST_Intersects
• Determines if two shapes intersect.
Page 23
Yes Not So Much
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
What Trips Go To SFO?
• Approach:
–Draw a polygon around SFO drop-off area.
–Using the ST_LineStrings, see which trips intersect with this polygon.
Page 24
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
SFO Drop-Off Area
• Inserted into table locations (name string, location string) for
easy joining against other shapes.
• Data estimated using Google Maps.
Page 25
Name Location
SFO
ST_Polygon(
37.616543, -122.392291,
37.613297, -122.392119,
37.616458, -122.389115,
37.613552, -122.389051)
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Computing the Intersection
Page 26
SELECT
count(id)
FROM (
SELECT
id,
ST_LineString(collect_array(point)) as trip
FROM (
SELECT
id,
ST_Point(longitude, latitude) AS point
FROM
uber
) points
GROUP BY
id
) trips JOIN (
SELECT ST_Polygon(definition) as sfo_coordinates
FROM locations
WHERE locations.name = "SFO"
) sfosub
WHERE
ST_Intersects(sfosub.sfo_coordinates, trips.trip);
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Demo
Counting Number of Trips to SFO in Sandbox
Page 27
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Counting It Up
• 80 / 25000 Uber trips went to SFO (0.32%)
• SFO Rideshare Product, maybe not a great idea.
Page 28
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Conclusion
• Spatial Framework for Hadoop makes geo analytics simple
with Hadoop and Hive.
• Hive 11 makes it simple to slice and dice datasets with
powerful analytics like windowing.
• Open source, extend and change to fit your needs.
Page 29
Deep Dive content by Hortonworks, Inc. is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Unported License.
Try It For Yourself
• Spatial Framework for Hadoop
–esri.github.io/gis-tools-for-hadoop
• UDFs, extra data and Hive queries
–github.com/cartershanklin/hive-spatial-uber
– (For the collect_array UDAF, queries and extra data)
–github.com/cartershanklin/spatial-framework-for-hadoop
– (For the extra ST_LineString constructor)
• Main Dataset
–infochimps.com/datasets/uber-anonymized-gps-logs
• Hortonworks Sandbox
–The easiest way to learn Hadoop.
–hortonworks.com/sandbox
Page 30

More Related Content

What's hot

OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Diverajdeep
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSDocker, Inc.
 
Docker Hub: Past, Present and Future by Ken Cochrane & BC Wong
Docker Hub: Past, Present and Future by Ken Cochrane & BC WongDocker Hub: Past, Present and Future by Ken Cochrane & BC Wong
Docker Hub: Past, Present and Future by Ken Cochrane & BC WongDocker, Inc.
 
Introduction to Subversion
Introduction to SubversionIntroduction to Subversion
Introduction to SubversionAtul Jha
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Thomas Graf
 
Non-Fluff Software Defined Networking, Network Function Virtualization and IoT
Non-Fluff Software Defined Networking, Network Function Virtualization and IoTNon-Fluff Software Defined Networking, Network Function Virtualization and IoT
Non-Fluff Software Defined Networking, Network Function Virtualization and IoTMark Ryan Castellani
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
Docker networking Tutorial 101
Docker networking Tutorial 101Docker networking Tutorial 101
Docker networking Tutorial 101LorisPack Project
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Etsuji Nakai
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Hao H. Zhang
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD processHYS Enterprise
 
Manejo de packages en Kubernetes con Helm
Manejo de packages en Kubernetes con HelmManejo de packages en Kubernetes con Helm
Manejo de packages en Kubernetes con HelmMario IC
 
Shell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaShell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaEdureka!
 
Understanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolUnderstanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolGabor Paller
 
Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1Toshiaki Maki
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In KubernetesDon Jayakody
 

What's hot (20)

OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Dive
 
Kernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVSKernel load-balancing for Docker containers using IPVS
Kernel load-balancing for Docker containers using IPVS
 
Docker Hub: Past, Present and Future by Ken Cochrane & BC Wong
Docker Hub: Past, Present and Future by Ken Cochrane & BC WongDocker Hub: Past, Present and Future by Ken Cochrane & BC Wong
Docker Hub: Past, Present and Future by Ken Cochrane & BC Wong
 
Introduction to Subversion
Introduction to SubversionIntroduction to Subversion
Introduction to Subversion
 
Maven
MavenMaven
Maven
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
 
Docker compose
Docker composeDocker compose
Docker compose
 
Non-Fluff Software Defined Networking, Network Function Virtualization and IoT
Non-Fluff Software Defined Networking, Network Function Virtualization and IoTNon-Fluff Software Defined Networking, Network Function Virtualization and IoT
Non-Fluff Software Defined Networking, Network Function Virtualization and IoT
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
Docker networking Tutorial 101
Docker networking Tutorial 101Docker networking Tutorial 101
Docker networking Tutorial 101
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
Architecture Overview: Kubernetes with Red Hat Enterprise Linux 7.1
 
Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1Kubernetes Architecture - beyond a black box - Part 1
Kubernetes Architecture - beyond a black box - Part 1
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
From Zero to Docker
From Zero to DockerFrom Zero to Docker
From Zero to Docker
 
Manejo de packages en Kubernetes con Helm
Manejo de packages en Kubernetes con HelmManejo de packages en Kubernetes con Helm
Manejo de packages en Kubernetes con Helm
 
Shell Scripting Tutorial | Edureka
Shell Scripting Tutorial | EdurekaShell Scripting Tutorial | Edureka
Shell Scripting Tutorial | Edureka
 
Understanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer toolUnderstanding the Dalvik bytecode with the Dedexer tool
Understanding the Dalvik bytecode with the Dedexer tool
 
Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1Introduction to Spring WebFlux #jsug #sf_a1
Introduction to Spring WebFlux #jsug #sf_a1
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In Kubernetes
 

Similar to How To Analyze Geolocation Data with Hive and Hadoop

JSR-82 Bluetooth tutorial
JSR-82 Bluetooth tutorialJSR-82 Bluetooth tutorial
JSR-82 Bluetooth tutorialSoham Sengupta
 
Vortex Tutorial -- Part I
Vortex Tutorial -- Part IVortex Tutorial -- Part I
Vortex Tutorial -- Part IAngelo Corsaro
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQLRoberto Franchini
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Guido Schmutz
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction DimitrisFinas1
 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...IRJET Journal
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...Ted Chien
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Roberto Franchini
 
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...An energy efficient geographic routing protocol design in vehicular ad-hoc ne...
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...sinaexe
 
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET Journal
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentationlilyco
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxDatabricks
 
Exploring Openstack Swift(Object Storage) and Swiftstack
Exploring Openstack Swift(Object Storage) and Swiftstack Exploring Openstack Swift(Object Storage) and Swiftstack
Exploring Openstack Swift(Object Storage) and Swiftstack Ramit Surana
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADtab0ris_1
 

Similar to How To Analyze Geolocation Data with Hive and Hadoop (20)

JSR-82 Bluetooth tutorial
JSR-82 Bluetooth tutorialJSR-82 Bluetooth tutorial
JSR-82 Bluetooth tutorial
 
Abstract
AbstractAbstract
Abstract
 
PrismTech Vortex Tutorial Part 1
PrismTech Vortex Tutorial Part 1PrismTech Vortex Tutorial Part 1
PrismTech Vortex Tutorial Part 1
 
Vortex Tutorial -- Part I
Vortex Tutorial -- Part IVortex Tutorial -- Part I
Vortex Tutorial -- Part I
 
OrientDB - The 2nd generation of (multi-model) NoSQL
OrientDB - The 2nd generation of  (multi-model) NoSQLOrientDB - The 2nd generation of  (multi-model) NoSQL
OrientDB - The 2nd generation of (multi-model) NoSQL
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka Location Analytics - Real-Time Geofencing using Kafka
Location Analytics - Real-Time Geofencing using Kafka
 
OpenTelemetry Introduction
OpenTelemetry Introduction OpenTelemetry Introduction
OpenTelemetry Introduction
 
IRJET - Identification and Classification of IoT Devices in Various Appli...
IRJET -  	  Identification and Classification of IoT Devices in Various Appli...IRJET -  	  Identification and Classification of IoT Devices in Various Appli...
IRJET - Identification and Classification of IoT Devices in Various Appli...
 
seminar report
seminar reportseminar report
seminar report
 
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
viWave Study Group - Introduction to Google Android Development - Chapter 23 ...
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?
 
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...An energy efficient geographic routing protocol design in vehicular ad-hoc ne...
An energy efficient geographic routing protocol design in vehicular ad-hoc ne...
 
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
IRJET- Autonomous Underwater Vehicle: Electronics and Software Implementation...
 
BWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 PresentationBWC Supercomputing 2008 Presentation
BWC Supercomputing 2008 Presentation
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
GeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony FoxGeoMesa on Apache Spark SQL with Anthony Fox
GeoMesa on Apache Spark SQL with Anthony Fox
 
Nandita resume
Nandita resumeNandita resume
Nandita resume
 
Exploring Openstack Swift(Object Storage) and Swiftstack
Exploring Openstack Swift(Object Storage) and Swiftstack Exploring Openstack Swift(Object Storage) and Swiftstack
Exploring Openstack Swift(Object Storage) and Swiftstack
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

How To Analyze Geolocation Data with Hive and Hadoop

  • 1. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Spatial Analytics with Hive Hive Meetup – July 24, 2013 @cshanklin Page 1
  • 2. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Why Spatial Analytics? • Amount of spatial data has exploded due to mobile device ubiquity and more reliance on sensors. • Proliferation of consumer-oriented mapping products brings spatial analytics to the mainstream. Page 2
  • 3. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. An Interesting Dataset • GPS data collected from Uber trips. • Anonymized, maintains days/times but not dates. • Obtained from InfoChimps Page 3
  • 4. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Data Sample Page 4 ID Date Time Latitude Longitude 1 1/7/07 10:54:50 37.782551 -122.445368 1 1/7/07 10:54:54 37.782745 -122.444586 1 1/7/07 10:54:58 37.782842 -122.443688 1 1/7/07 10:55:02 37.782919 -122.442815 1 1/7/07 10:55:06 37.782992 -122.442112 1 1/7/07 10:55:10 37.7831 -122.441461 1 1/7/07 10:55:14 37.783206 -122.440829 1 1/7/07 10:55:18 37.783273 -122.440324 Overall 1.1M distinct readings 25,000 distinct trips.
  • 5. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Meanwhile, At Uber Headquarters… Page 5
  • 6. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Questions Uber Might Ask: • What do trips tend to look like? • How can we reduce wait time and make more trips? • Are there new products we should introduce? Page 6
  • 7. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Answering The Questions • Why Use SQL? –Well understood by analysts. –Huge ecosystem, access Hive from any of 20+ BI tools. • Why Hive? –Supports advanced SQL analytics like windowing functions. –Java based, makes it easy for 3rd parties to add extensions. • Last Reason –This is the Hive meetup. Were you expecting ABAP? Page 7
  • 8. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Getting a feel for the trips. Page 8
  • 9. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Duration • To get the duration all we need to do is: –Subtract the last timestamp from the first timestamp. –Do it per trip ID (1-25000). • OK, how do we do it with SQL? Page 9
  • 10. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Getting First Or Last Values In A Partition Page 10 -- Get the last observation from each trip ID. -- Standard approach on any SQL system that supports windowing. SELECT * FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY uber.id ORDER BY uber.dt DESC) as rn FROM uber ) sub1 WHERE rn = 1;
  • 11. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. And Hive Supports Windowing Now (0.11+) Page 11 Name Purpose CUME_DIST Number of rows with values lower than (or greater than if ORDER BY DESC) the current row. DENSE_RANK The dense rank of the row within the partition. If any rows “tie” or have the same value, they receive the same rank. DENSE_RANK does not have gaps in the ranks, in contrast to RANK. FIRST_VALUE The value in the first row within the partition. LAST_VALUE Surprisingly, not the opposite of FIRST_VALUE (if you want that just change your sort order.) LAST_VALUE is tricky, look it up. LAG Value from a prior row in the partition. LEAD Value from a subsequent row in the partition. NTILE Divides rows in a partition into N many groups. ROW_NUMBER The row number of the row within the partition. RANK The rank of the row within the partition. This differs from ROW_NUMBER in that ties receive the same value.
  • 12. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Compute Trip Durations Page 12 -- Subtract the first timestamp from the last timestamp. -- Use FIRST_VALUE and ROW_NUMBER to help compare first and last timestamps. SELECT id, (unix_timestamp(dt) - unix_timestamp(fv)) as trip_duration FROM ( SELECT id, dt, fv FROM ( SELECT id, dt, FIRST_VALUE(dt) OVER (PARTITION BY id ORDER BY dt) as fv, ROW_NUMBER() OVER (PARTITION BY id ORDER BY dt DESC) as lastrk FROM uber ) sub1 WHERE lastrk = 1 ) sub2;
  • 13. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Trip Duration SQL Output Page 13 id trip_duration 1 128 2 148 3 150 4 336 5 400 6 168 7 142 8 558 9 312 10 208 ... (25,000 total trips)
  • 14. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Duration Was Easy, What About Distance? • All we have is GPS readings. • If we draw a line from GPS readings, it estimates trip distance. • GPS readings are 4s apart, estimates should be close. Page 14 Actual Route GPS Signal Estimated Route
  • 15. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Enter GIS Tools for Hadoop Page 15 esri.github.io/gis-tools-for-hadoop Works with Hive and Map-Reduce Syntax similar to other spatial systems like PostGIS Open Source
  • 16. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Spatial Framework for Hadoop Functions Page 16 Name Purpose ST_LineString Create a line from coordinates supplied in a string. ST_Polygon Create a polygon. ST_SetSRID Set Spatial Reference ID. SRID 4326 corresponds to WGS84. ST_GeodesicLengthWGS84 Compute length of a line in meters assuming points use the World Geodetic System 1984. GPS uses the WGS84 coordinate system. ST_Length Compute Cartesian length. ST_Contains Determine if one spatial object contains another spatial object. ST_Intersects Determine if two spatial objects intersect. ST_AsText Return a text representation of a spatial object, suitable for storing in a Hive string column. Objects can also be saved in binary columns with no conversion. 82 total spatial functions provided by Spatial Framework for Hadoop.
  • 17. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. ST_LineString: Make a line. • 2 Constructors –ST_LineString(1, 1, 2, 2, 3, 3); – Simple constructor. –ST_LineString('linestring(1 1, 2 2, 3 3)'); – WKT or Well-Known-Text constructor. • Neither approach very convenient for this dataset. • Since SF4H is open-source I added a new constructor: –ST_LineString([Array of ST_Point Objects]); Page 17
  • 18. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. collect_array: Custom UDAF turns columns to arrays Page 18 ID Date Time Latitude Longitude 1 1/7/07 10:54:50 37.782551 -122.445368 1 1/7/07 10:54:54 37.782745 -122.444586 1 1/7/07 10:54:58 37.782842 -122.443688 1 1/7/07 10:55:02 37.782919 -122.442815 1 1/7/07 10:55:06 37.782992 -122.442112 1 1/7/07 10:55:10 37.7831 -122.441461 1 1/7/07 10:55:14 37.783206 -122.440829 1 1/7/07 10:55:18 37.783273 -122.440324 > SELECT id, collect_array(latitude) FROM table GROUP BY id; (1, [ 37.782551, 37.782745, 37.782842, 37.782919, 37.782992 ... ]) ...
  • 19. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Computing Trip Lengths Now Trivial Page 19 -- Compute the trip lengths. -- Our coordinates conform to WGS84, use that to compute distances. -- ST_SetSRID(_, 4326) marks the object as conforming to WGS84. -- Group by trip ID. SELECT id, ST_GeodesicLengthWGS84( ST_SetSRID( ST_LineString(collect_array(point)), 4326)) as length FROM ( SELECT id, ST_Point(longitude, latitude) as point FROM uber ) sub GROUP BY id; Generate an ST_Point for each row Group the points, turn them into arrays and make a line out of it.
  • 20. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Demo Computing Trip Distances in Hortonworks Sandbox Page 20
  • 21. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Visualizing Trip Times and Durations Page 21
  • 22. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Time For a New Product? • How Likely is Demand for an SFO Rideshare? • How many trips even go to SFO? Page 22
  • 23. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. ST_Intersects • Determines if two shapes intersect. Page 23 Yes Not So Much
  • 24. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. What Trips Go To SFO? • Approach: –Draw a polygon around SFO drop-off area. –Using the ST_LineStrings, see which trips intersect with this polygon. Page 24
  • 25. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. SFO Drop-Off Area • Inserted into table locations (name string, location string) for easy joining against other shapes. • Data estimated using Google Maps. Page 25 Name Location SFO ST_Polygon( 37.616543, -122.392291, 37.613297, -122.392119, 37.616458, -122.389115, 37.613552, -122.389051)
  • 26. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Computing the Intersection Page 26 SELECT count(id) FROM ( SELECT id, ST_LineString(collect_array(point)) as trip FROM ( SELECT id, ST_Point(longitude, latitude) AS point FROM uber ) points GROUP BY id ) trips JOIN ( SELECT ST_Polygon(definition) as sfo_coordinates FROM locations WHERE locations.name = "SFO" ) sfosub WHERE ST_Intersects(sfosub.sfo_coordinates, trips.trip);
  • 27. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Demo Counting Number of Trips to SFO in Sandbox Page 27
  • 28. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Counting It Up • 80 / 25000 Uber trips went to SFO (0.32%) • SFO Rideshare Product, maybe not a great idea. Page 28
  • 29. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Conclusion • Spatial Framework for Hadoop makes geo analytics simple with Hadoop and Hive. • Hive 11 makes it simple to slice and dice datasets with powerful analytics like windowing. • Open source, extend and change to fit your needs. Page 29
  • 30. Deep Dive content by Hortonworks, Inc. is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Try It For Yourself • Spatial Framework for Hadoop –esri.github.io/gis-tools-for-hadoop • UDFs, extra data and Hive queries –github.com/cartershanklin/hive-spatial-uber – (For the collect_array UDAF, queries and extra data) –github.com/cartershanklin/spatial-framework-for-hadoop – (For the extra ST_LineString constructor) • Main Dataset –infochimps.com/datasets/uber-anonymized-gps-logs • Hortonworks Sandbox –The easiest way to learn Hadoop. –hortonworks.com/sandbox Page 30

Editor's Notes

  1. If you spotted the error in this slide… we’re hiring.
  2. If you spotted the error in this slide… we’re hiring.
  3. If you spotted the error in this slide… we’re hiring.
  4. If you spotted the error in this slide… we’re hiring.