Contenu connexe
Similaire à Oracle big data spatial and graph (20)
Oracle big data spatial and graph
- 1. Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Big Data Spatial and Graph
An Overview
ilOUG Tech Days - June, 2015
Michel Benoliel
Master Principal Consultant
Oracle Israel
- 2. Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information
purposes only, and may not be incorporated into any contract. It is not a commitment to deliver
any material, code, or functionality, and should not be relied upon in making purchasing decisions.
The development, release, and timing of any features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
2
- 3. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Agenda
• Spatial and Graph Strategy
•Introduction to Big Data Spatial and Graph
• Spatial Features
•Graph Features
3
- 4. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle’s Spatial and Graph Strategy
Enable Spatial and Graph use cases on every Big Data platform
NoSQL
Oracle Big Data Spatial and Graph
Oracle Database
Spatial and Graph
4
- 5. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Conventional database or Big Data technologies
Typical technical decision criteria
0
1
2
3
4
5
Tooling maturity
Stringent Non-Functionals
ACID transactional
requirement
Security
Variety of data formats
Data sparsity
ETL simplicity
Cost effectively store low
value data
Ingestion rate
Straight Through Processing
(STP)
Hadoop
Relational
Hadoop and/or NoSQL
Relational Database
5
- 6. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Big Picture – Oracle Big Data Management System
SOURCES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry
Models
Oracle Advanced Analytics
Oracle Spatial & Graph
Big Data Appliance
Apache
Flume
Oracle
GoldenGate
Oracle Event
Processing
Cloudera Hadoop
Oracle Big Data SQL
Oracle NoSQL
Oracle R Distribution
Oracle Big Data
Spatial and Graph
Oracle Database
In-Memory, Multi-tenant
Oracle Industry Models
Oracle Advanced
Analytics
Oracle Spatial and Graph
Exadata
Oracle
GoldenGate
Oracle Event
Processing
Oracle Data
Integrator
Oracle Big Data
Connectors
Oracle Data
Integrator
B
6
- 7. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Spatial and Graph
Property Graph
for Analysis of:
• Social Media
relationships
• Internet of
Things
interactions
• Cyber-Security
Spatial Analysis
Features for:
• Location Data
Enrichment
• Proximity and
containment
analysis
• Preparation of
digital map
and imagery
data sets
7
- 8. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Introduction: Spatial for Big Data
8
- 9. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
What is Spatial Data
Integral part of almost every database
• Business data that contains or describes location
– Geographic features (roads, rivers, parks, etc.)
– Assets (pipe lines, cables, transformers,
– Sales data (sales territory, customer registration, etc.)
– Street and postal address (customers, stores, factories, etc.)
• Anything associated with a physical location
• Described by coordinates or implicitly as text (place name), ...
• Location is a “universal key” relating otherwise unrelated entities
- 10. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Linking information by location
Are these data points related?
• Tweet: sailing by #goldengate
• Instagram image subtitle: 골든게이트 교*
• Text message: Driving on 101 North , just reached border
between Marin County and San Francisco County
• GPS Sensor: N 37°49′11″ W 122°28′44″
• Now find all data points around Golden Gate Bridge ...
* Golden Gate Bridge (in Korean)
- 11. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted
Oracle Big Data – Spatial Features
• Geo-enrichment for Data Harmonization
– Resolution of location-related information
– Determination of location hierarchies
• Categorization and filtering
– Tracking, proximity analysis, geo-fencing and categorization based on location
• Data preparation
– Large scale geoprocessing for cleansing, preparation of imagery, sensor data, and raw
data input
• Data visualization
11
- 12. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use Cases: Geocoding, Geo-Enrichment
12
(i) Geocode Call Data Records, sensor data, other sources
to aggregate and display wireless network
performance (dropped calls, utilization)
(ii) Transportation origin-destination analysis. Combine
transit card/payment info and other sources to
determine where (and how many) people travel to,
starting from any station on a transit network
(iii) Geotagged Twitter: where are the tourists and locals
tweeting
i ii
iii
- 13. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use case: Categorization, filtering, aggregation
13
- 14. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use case: Data preparation
14
Mosaic images
Terrains and contours
Shaded reliefs
Pyramiding: layers at different resolution
- 15. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Spatial Features – Technical overview
• Support for spatial data in 2D or 3D in various formats, geodetic or projected
• Support for geo-referenced imagery such as satellite images in many formats
• MapReduce framework for resolution of placenames and determination location
hierarchies, including reference dataset
• Spatial indexing techniques for fast retrieval of spatial data
• Library of spatial operators for geometric analysis (inside, within distance, nearest
neighbor, ...)
• Library of image processing functions (mosaic, reprojection, format conversion, analysis,
...)
• Console for visual analysis, indexing, processing
– Sample JEE application to be deployed in Jetty
- 16. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Create Index on
spatial data in HDFS
- 17. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Run Map Reduce
job to perform
categorization
based on spatial
hierarchy
- 18. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Console
Results in Console
“Tweets in May by
State”
- 19. Copyright © 2015 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal/Restricted/Highly Restricted 19
- 20. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Data Model
What is a graph?
– A set of edges (links) and vertexes (nodes) (and optionally properties)
– A graph is simply linked data
Why do we care?
– Graphs are everywhere
• Social networks/Social Web (Facebook, Linkedin, Twitter, Baidu, Google+,…)
• Cyber networks, power grids, protein interaction graphs
• Knowledge graphs (IBM Watson, Apple SIRI, Google Knowledge Graph)
– Graphs are intuitive and flexible
• Easy to navigate, easy to form a path, natural to visualize
• Do not require a predefined schema
E
A D
C B
F
- 21. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Why Graph Databases Now?
Rise of social networking
Google, Yahoo, Twitter, Facebook, Linked In
Enterprise applications increasingly need to model data relationships
Telecoms: Network & Data center management, identity management
Financial Services: Fraud detection; cross-selling
Media & Publishing: Social apps, recommendation, sentiment
Health Care: CRM, fraud detection
Modeling complex relationships as graphs is efficient
Improves performance
Simplifies queries, traversal, search and analytics
21
- 22. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graphs Are Big . . . and Getting Bigger
• Social Scale*
– 1 billion vertices, 100 billion edges
• Web Scale*
• 50 billion vertices, 1 trillion edges
• Brain Scale*
• 100 billion vertices, 100 trillion edges
* An NSA Big Graph Experiment
http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
- 23. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph for Social and Unstructured Data Analysis
Graph is a powerful tool for Data Analysis
you capture fine-grained, arbitrary
By representing your data as a graph with
relationships between data entities
Individual relationships are
represented as links
When analyzing such a graph,
you are using explicit relationships
to find implicit information
about your data
Without computing
multiple joins
24
- 24. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Supports Different Kinds of Graphs
RDF Data Model
• Data federation
• Knowledge representation
• Graph pattern analysis
Social Network
Analysis
National Intelligence
Public Safety
Social Media search
Marketing - Sentiment
Linked Data /
Enterprise Metadata
Property Graph Model
• Graph Search & Analysis
• Big Data analytics
• Entity analytics
Life Sciences
Health Care
Publishing
Finance
Spatial Network
Analysis
Logistics
Transportation
Utilities
Telcoms
Network Data Model
• Network path analysis
• Multi-model modeling
Use Case Graph Model Industry Domain
- 25. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Property Graph Data Model
• A set of vertices (or nodes)
– each vertex has a unique identifier.
– each vertex has a set of in/out edges.
– each vertex has a collection of key-value
properties.
• A set of edges (or links)
– each edge has a unique identifier.
– each edge has a head/tail vertex.
– each edge has a label denoting type of
relationship between two vertices.
– each edge has a collection of key-value
properties.
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
- 26. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Common Graph Analysis Use Cases
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication
Stream (e.g. tweets)
Graph Pattern MatchingCommunity Detection
Recommend the most
similar item purchased by
similar people
Find out people that are
central in the given
network – e.g. influencer
marketing
Identify group of people
that are close to each other
– e.g. target group
marketing
Find out all the sets of
entities that match to the
given pattern – e.g. fraud
detection
27
- 27. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Analysis Examples:
• Attribute searching (Get people with a given name)
• Vertex/edge adjacency (Get people that like a given Web page)
• Fixed-length paths (Get the friends of the friends of a given person)
• Reach-ability (Is there a “friend” connection between two people?)
• Pattern matching (Get the common friends between two people)
• Aggregates (Get the number of friends of a given person)
28
- 28. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Use cases
Social Media Management & Analysis Identifying common friends and interests; identify
primary influencers in social network; graph data
management of large social media services
Online Product Recommendations Recommender systems; sentiment analysis;
customer churn analysis; customer behavior
analytics; customer trend prediction
Internet of Things Manage data properties and relationships for
complex webs of inter-operating devices and
systems; predictive modeling of system behavior
Cyber-Security Fraud detection; identity management; reveal
clusters of similar behaviors and properties;
discover relationships based on pattern matching
29
- 29. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Product Details: Graph Features
30
- 30. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Architecture
Oracle Big Data Spatial and Graph
Scalable and Persistent Storage
Graph Data Access Layer API
Graph Analytics
In-memory Analytic Engine
RESTWebService
Blueprints & SolrCloud / Lucene
Property Graph Support on
Apache HBase and Oracle NoSQL
Python,Perl,PHP,Ruby,
Javascript,…
Java APIs
Java APIs
31
- 31. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Spatial and Graph
Property Graph Features
• Highly scalable graph database and analytics engine
• Implemented on Apache HBase and Oracle NoSQL Database
• Rich developer APIs
– Blueprints, REST, Java graph plus support for Groovy, Python, PHP, Perl, Ruby, and JavaScript
• Fast, scalable suite of social network analysis functions
– Ranking, centrality, recommender, community detection, path finding…
– Targeted to address main industry requirements
• Manageability
– Bulk load
– Console to execute Java and Gremlin APIs
32
- 32. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Support for Open Source TinkerPop Graph Tool Stack
Oracle Big Data Spatial and
Graph Blueprints API
implementation provides
support for the de-facto
graph database standard
TinkerPop component
stack.
These include query
language, dataflow, REST
APIs, and others.
33
- 33. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph Data Access Layer
• Tinkerpop Tools: Blueprints, Gremlin, Rexter supported
• Graph schema optimized on Apache HBase
• Graph schema optimized on Oracle NoSQL Database
• GraphML, GML, GraphSON, and Oracle-defined flat files (.ope & .opv)
• Bulk load of property graph data
34
- 34. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph Data Access Layer
• Parallel scan of property graph data
• Apache Lucene Text search of graph data
• Groovy shell for accessing property graph data
• iPython-based interface example
• SolrCloud integration
35
- 35. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Developer APIs for Analytics
• Blueprint APIs
– Modify/insert/delete graph edges, vertices, key-values
– Blueprints stack (Gremlin, Pipes, etc.) provide additional functionality
• Java: High level graph analysis APIs expose core functions of graph engine
– Customers perform graph analysis using these Java APIs
– Graph and subgraph identification (using key-value constraints)
– R access through Java APIs
• REST APIs
– Graph analysis
– Graph and sub-graph identification (using key-value constraints)
• Web scripting languages supported through REST APIs above
– PHP, Python, Ruby, Groovy, etc.
- 36. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Property Graph in-Memory Graph Analytics
• Parallel data reading from data access layer into memory
• Choice of deployment
– Standalone application server
– On Hadoop node
• Graph formats (in addition to those supported by the data access layer)
– EBin, Adjacency list, Edge List
• J2EE container support (WLS, Tomcat, Jetty)
- 37. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
35 Graph Functions
Detecting Components and Communities
Tarjan’s, Kosaraju’s,
Weakly Connected Components, Label
Propagation (w/ variants), Soman and
Narang’s
Ranking and Walking
Pagerank, Personalized Pagerank,
Betweenness Centrality (w/ variants),
Closeness Centrality, Degree Centrality,
Eigenvector Centrality, HITS,
Random walking and sampling (w/ variants)
Evaluating Community Structures
∑ ∑
Conductance, Modularity
Clustering Coefficient (Triangle
Counting)
Adamic-Adar
Path-Finding
Hop-Distance (BFS)
Dijkstra’s,
Bi-directional Dijkstra’s
Bellman-Ford’s
Link Prediction SALSA
(Twitter’s Who-to-follow)
Other Classics Vertex Cover
Minimum Spanning-Tree(Prim’s)
- 39. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
In-Memory Graph Analysis Framework
• Large graph analysis is time-consuming because …
– The computation typically involves touching most nodes and edges in the graph
– The data-access pattern is random
• In-memory, parallel framework for fast graph analytics
• Exploits the architecture of modern servers
– The computation is parallelized using multiple CPU cores
– The non-sequential data-access is mitigated with large DRAMs
40
- 40. Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Property Graph Engine
• Reads graph from Apache HBase or
Oracle NoSQL
• Data Access Layer filtering used to create
subgraph for Property Graph Engine
analytics
Oracle Property Graph or RDF
(HBase or NoSQL)
Property
Graph Engine
Analytic
Request
Analytic
Request
Analytic
Request
Analytic
Request
Analytic
Request
Analytic
Request
Trans-
actional
Request
41