SlideShare une entreprise Scribd logo
1  sur  27
IBM Analytics Platform Group
Enterprise Graph Analytics
Enterprise large scale graph analytics and computing base on distribute
graph database(Titan DB HBase/Solr) and distributed graph computing in
memory(TinkerPop Hadoop Gremlin SparkGraphComputer) and Hadoop2
• Jun(Terry) Yang • yangjuncn@cn.ibm.com
• Jing Chen(Jerry) He • jinghe@us.ibm.com
• Hadoop Summit 2017 SAN JOSE, USA JUNE 13-15
2© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
3© IBM 2017 Hadoop Summit 2017
Hybrid data analytics and challenges
How was “total quantity” calculated? Show me the lineage?
What are the source-to-target mappings for the DW?
Who read the “sales” data in non-working time? How to ensure data quality?
Data Warehouse Architect
Auditor
Business Person
Data Architect
4© IBM 2017 Hadoop Summit 2017
How to handle the challenges?
DataGovernance
Data Lifecycle
Management
Data Quality
Management
•Correctness
Consistency
Completeness
Timeliness
Metadata
…
Master Data
management
…
5© IBM 2017 Hadoop Summit 2017
What is Metadata?
• The data used to describe other data
− Simple Metadata
− Rich Metadata
• inode attributes for file management
• Filesystem object attributes include metadata,
like modify time, access, owner, permission, etc.
File systems metadata
• Schema for data management
• Ownership information of data
• Server/Database information of data
DBMS/DW/NOSQL metadata
How to manage the metadata cross platform/system/server?
6© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
7© IBM 2017 Hadoop Summit 2017
Advantage of Graph in Metadata management
Traditional solution
• Limited in one server/system
• Metadata managed within a
server/system
Property Graph based solution
• Integrate metadata
• Handle storage pressure
• Efficient Processing and Querying
• Lineage
• Wild range managed
8© IBM 2017 Hadoop Summit 2017
Property Graph
Key1:value1
Key2:value2
Key1:value1
Key2:value2
Label
Edge
Properties
Vertex
G = ( V, E )
Graph Vertices Edges
label1
• Born for relationship
• Intuitive modeling
• Expressive querying
• Native analysis
9© IBM 2017 Hadoop Summit 2017
Using Graph Analytics to Find Complex Patterns
1st degree relationship
2nd degree relationship
3rd degree relationship
• Graph queries are a natural
way for analyzing relationship
patterns
 Less complex than SQL
 Can handle high degrees of
relationship with ease
• Graph schema facilitates
visualization and exploration
of relationships
10© IBM 2017 Hadoop Summit 2017
Case study - Audit data access
• Data theft risk in enterprise in hybrid
– Most data stolen by internal person.
– Most data theft happened in non-working time.
– Over-granting of privileges may cause data theft.
11© IBM 2017 Hadoop Summit 2017
Enterprise data quality analytics system based
on graphed metadata
Data ingest
finance data
Consumption data
Credit data
Behavioral data
Graphed metadata
…
Feature Selection
Statistical learning
Data analysis
(Graphed) Metadata
analysis
…
Advanced Feature
Selection
Gradient Boosting
Decision Tree
Support Vector
Machine
Random Forests
PageRank(Graph)
…
Modeling
Customer risk rating
Consumption
Capacity
Graph model
…
Recommendation
Consumer behavior
Fraud detection
Risk analytics(Audit)
…
12© IBM 2017 Hadoop Summit 2017
Data ingest
user
programData
Run
Read
name,
job id,
params,
config,
inputs,
outputs,
start_ts,
finish_ts,
…
id,
name,
group,
permission,
…
name,
size,
location,
department,
permission,
parent,
children,
…
ts_hour,
ts_min,
ts_sec,
status,
…
Metadata Integration
Graph-based Traversal
• User
• Program
• Data
• …
•Entitles  Vertices
• User run program
• Program read data
• …
Relationships  Edges
• Name
• ….
Attributes  Properties
Identify entities and relationships Metadata to Graph
13© IBM 2017 Hadoop Summit 2017
Feature Selection
Who read the sensitive sales data in non-working time?
Query: userFeaSele = graph.traversal().
V().has("department","sales").inE("read").outV().hasLabel('progra
m').inE("run").has(“ts_hour",not(within(9,17))).outV()
Find the user who has the access to large amount data?
Query: … withComputer(SparkGraphComputer) …
userAdvFeaSele =
userFeaSele.pageRank().by('pageRank').order().by('pageRank').li
mit(30)
FeatureSelection
AdvancedFeature
Selection
14© IBM 2017 Hadoop Summit 2017
Modeling
• Modeling risk analysis with graphed metadata, information in ERP.
• Analyze the user with employee information from ERP, with years of
working, age, role, to identify suspect. A non-sales person, for
example, an application R&D person, will be the suspect.
• Audit Recommendation.
Risk analysis model
Graph: User List(userAdvFeaSele)
ERP: Employee information
ERP: Violation information
Audit Recommendation
Risk analysis report
Suspects who stole
sensitive data
Advanced
Feature
Selection
Other
system
15© IBM 2017 Hadoop Summit 2017
Agenda
• Challenges in hybrid data analytics
• Enterprise data quality analytics system based on graphed metadata
• Graph in enterprise data quality analytics solution
16© IBM 2017 Hadoop Summit 2017
User data
Machine data
log data
Behavioral data
Graphed metadata
Enterprise data quality system
Feature
analysis
Lineage Metadata
management
Cleansing
Hadoop Hbase Hive
HDFS Spark Titan
Solr
…
Data Source
third-party
data
Ingest(load)
Business Application
Risk management
Data audit
Graph in enterprise data quality analytics solution
……
Cost analytics
17© IBM 2017 Hadoop Summit 2017
How to choose Enterprise Graph Database?
Data storing features
Operation and manipulation features
Graph data structures
Query features
Schema and instance representation
Easy and centralized Management
Expose service
Security features
Fast computing
Evaluate Graph database from following perspective:
18© IBM 2017 Hadoop Summit 2017
Titan
• What is Titan
− Distributed Graph Database
− Based on TinkerPop (Gremlin)
− Open Source
• Titan Features
− Distribute
− Scalable : billions edges and vertices
− Real-time
− Transactional database (concurrent users/ACID/..)
− Global graph compute: graph data analytics, report, ETL
− Search: geo, numeric range, and full text search
19© IBM 2017 Hadoop Summit 2017
Titan solution architecture
application
Management API TinkerPop API - Gremlin
Internal API layer
Database layer(Tx, Data, Mgmt, Optimizer)
OLAPI/O
Interface
Storage and Index Interface Layer
HBase
Storage Backend
Solr
External Index Backend
Spark
Big Data Platform
Gremlin
GraphComputer
OLAP OLTP
Hadoop
 Optimized for storing and querying billions of vertices and edges over a cluster
 Supports thousands of concurrent users
 Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
20© IBM 2017 Hadoop Summit 2017
Backend – HBase & Solr
• HBase
− Tight integration with the Hadoop ecosystem.
− Native support for strong consistency.
− Linear scalability with the addition of more machines.
− Strictly consistent reads and writes.
− Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
− Support for exporting metrics via JMX.
− Open source under the liberal Apache 2 license.
• Solr
− Solr is the popular, blazing fast open source enterprise search platform from the
Apache Lucene project.
− Solr is a standalone enterprise search server with a REST-like API.
− Solr is highly reliable, scalable and fault tolerant, providing distributed indexing,
replication and load-balanced querying, automated failover and recovery, centralized
configuration and more.
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
Easy and centralized Management
Expose service
Security features
Fast computing
21© IBM 2017 Hadoop Summit 2017
Integration and management
Titan in Ambari
Titan
Deployment
Installation
Uninstallation
Titan client
deployment
Titan server
deployment
Titan server
operation
Start server
Stop server
Service check
Titan
Configuration
HBase backend
Solr backend
SparkGraphComputer
Titan server
Titan environment
Titan security
Titan security
support
SSL
SASL
LDAP
Kerberos
Knox
HBase Access control
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
Expose service
Security features
Fast computing
22© IBM 2017 Hadoop Summit 2017
Remote
Titan service
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Storage and Index Interface Layer
HBase Solr
Spark
Gremlin
GraphComputer
Gremlin Server Gremlin Console
Titan Engine
{RESTful} {Web Socket} Gremlin>
local
Titan server Titan client
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
Security features
Fast computing
23© IBM 2017 Hadoop Summit 2017
Cluster
Remote
Titan clientTitan server
Titan security enhancement
Spark
Gremlin
Graph
Computer
local
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Interface
Storage and Index Interface Layer
HBase Solr
SSL
Knox
SASL
LDAP/OS
/Kerberized
Titan user
HBase
Access
control
Kerberized
Cluster
Security
Description
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
 Security features
Fast computing
24© IBM 2017 Hadoop Summit 2017
Integrate TinkerPop
SparkGraphComputer with Titan DB
Mgmt API TP API - Gremlin
Internal API layer
Database layer
OLAPI/O
Interface
Storage and Index Interface Layer
HBase Solr
Gremlin GraphComputer
Graph
RDD
PageRankVertexProgram
PeerPressureVertexProgram
BulkDumperVertexProgram
BulkLoaderVertexProgram
TraversalVertexProgram
Spark-gremlin
SparkGraphComputer
Hadoop gremlin
Spark
 Data storing features
 Operation and manipulation features
 Graph data structures
 Query features
 Schema and instance representation
 Easy and centralized Management
 Expose service
 Security features
 Fast computing
25© IBM 2017 Hadoop Summit 2017
Open source Graph Database
A new Linux Foundation project
formed to continue development of
the TitanDB graph database.
Last Titan 1.0.0 was
release on Sep 20 2015
26© IBM 2017 Hadoop Summit 2017
References & Contacts
• Graph
− Titan: http://titan.thinkaurelius.com
− JanusGraph: http://janusgraph.org
− TinkerPop: https://tinkerpop.apache.org
Jun(Terry) Yang
Team Leader
yangjuncn@cn.ibm.com
Linkedin.com/in/terryjunyang
Jing Chen(Jerry) He
Architect
jinghe@us.ibm.com
Linkedin.com/in/jing-chen-jerry-he-1553511
27© IBM 2017 Hadoop Summit 2017
zzzz
z
z
z
Thanks!
Questions?

Contenu connexe

Tendances

Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryDataWorks Summit/Hadoop Summit
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...DataWorks Summit
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudDataWorks Summit
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceDataWorks Summit
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesDataWorks Summit
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...DataWorks Summit/Hadoop Summit
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...DataWorks Summit
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...DataWorks Summit
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark Summit
 

Tendances (20)

Navigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data DiscoveryNavigating the World of User Data Management and Data Discovery
Navigating the World of User Data Management and Data Discovery
 
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
How Apache Spark and Apache Hadoop are being used to keep banking regulators ...
 
Data-In-Motion Unleashed
Data-In-Motion UnleashedData-In-Motion Unleashed
Data-In-Motion Unleashed
 
Securing your Big Data Environments in the Cloud
Securing your Big Data Environments in the CloudSecuring your Big Data Environments in the Cloud
Securing your Big Data Environments in the Cloud
 
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open sourceBig SQL: Powerful SQL Optimization - Re-Imagined for open source
Big SQL: Powerful SQL Optimization - Re-Imagined for open source
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Insights into Real World Data Management Challenges
Insights into Real World Data Management ChallengesInsights into Real World Data Management Challenges
Insights into Real World Data Management Challenges
 
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
Worldpay - Delivering Multi-Tenancy Applications in A Secure Operational Plat...
 
Filling the Data Lake
Filling the Data LakeFilling the Data Lake
Filling the Data Lake
 
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
GeoWave: Open Source Geospatial/Temporal/N-dimensional Indexing for Accumulo,...
 
Hadoop for the Masses
Hadoop for the MassesHadoop for the Masses
Hadoop for the Masses
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
Spark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan SaldichSpark in the Enterprise - 2 Years Later by Alan Saldich
Spark in the Enterprise - 2 Years Later by Alan Saldich
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 

Similaire à Enterprise large scale graph analytics and computing base on distribute graph database (TItan DB Hbase/Solr) and distribute graph computing in memory (TInkerPop Hadoop Gremlin sparkgraphcomputer) and Hadoop2

Hadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analyticsHadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analyticsJun(Terry) Yang
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitecturePerficient, Inc.
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02email2jl
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarioskcmallu
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Martin Bém
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesAnalytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesProvectus
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...Agile Testing Alliance
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
 

Similaire à Enterprise large scale graph analytics and computing base on distribute graph database (TItan DB Hbase/Solr) and distribute graph computing in memory (TInkerPop Hadoop Gremlin sparkgraphcomputer) and Hadoop2 (20)

Hadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analyticsHadoop summit 2017 enterprise graph analytics
Hadoop summit 2017 enterprise graph analytics
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Creating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data ArchitectureCreating a Next-Generation Big Data Architecture
Creating a Next-Generation Big Data Architecture
 
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
Creatinganext generationbigdataarchitecture-141204150317-conversion-gate02
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23Prague data management meetup 2017-01-23
Prague data management meetup 2017-01-23
 
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data FederationNRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
NRB - LUXEMBOURG MAINFRAME DAY 2017 - Data Spark and the Data Federation
 
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
NRB - BE MAINFRAME DAY 2017 - Data spark and the data federation
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesAnalytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
ATAGTR2017 Bee-Hive approach for Big Data Testing [End to End Continuous Test...
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Enterprise large scale graph analytics and computing base on distribute graph database (TItan DB Hbase/Solr) and distribute graph computing in memory (TInkerPop Hadoop Gremlin sparkgraphcomputer) and Hadoop2

  • 1. IBM Analytics Platform Group Enterprise Graph Analytics Enterprise large scale graph analytics and computing base on distribute graph database(Titan DB HBase/Solr) and distributed graph computing in memory(TinkerPop Hadoop Gremlin SparkGraphComputer) and Hadoop2 • Jun(Terry) Yang • yangjuncn@cn.ibm.com • Jing Chen(Jerry) He • jinghe@us.ibm.com • Hadoop Summit 2017 SAN JOSE, USA JUNE 13-15
  • 2. 2© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 3. 3© IBM 2017 Hadoop Summit 2017 Hybrid data analytics and challenges How was “total quantity” calculated? Show me the lineage? What are the source-to-target mappings for the DW? Who read the “sales” data in non-working time? How to ensure data quality? Data Warehouse Architect Auditor Business Person Data Architect
  • 4. 4© IBM 2017 Hadoop Summit 2017 How to handle the challenges? DataGovernance Data Lifecycle Management Data Quality Management •Correctness Consistency Completeness Timeliness Metadata … Master Data management …
  • 5. 5© IBM 2017 Hadoop Summit 2017 What is Metadata? • The data used to describe other data − Simple Metadata − Rich Metadata • inode attributes for file management • Filesystem object attributes include metadata, like modify time, access, owner, permission, etc. File systems metadata • Schema for data management • Ownership information of data • Server/Database information of data DBMS/DW/NOSQL metadata How to manage the metadata cross platform/system/server?
  • 6. 6© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 7. 7© IBM 2017 Hadoop Summit 2017 Advantage of Graph in Metadata management Traditional solution • Limited in one server/system • Metadata managed within a server/system Property Graph based solution • Integrate metadata • Handle storage pressure • Efficient Processing and Querying • Lineage • Wild range managed
  • 8. 8© IBM 2017 Hadoop Summit 2017 Property Graph Key1:value1 Key2:value2 Key1:value1 Key2:value2 Label Edge Properties Vertex G = ( V, E ) Graph Vertices Edges label1 • Born for relationship • Intuitive modeling • Expressive querying • Native analysis
  • 9. 9© IBM 2017 Hadoop Summit 2017 Using Graph Analytics to Find Complex Patterns 1st degree relationship 2nd degree relationship 3rd degree relationship • Graph queries are a natural way for analyzing relationship patterns  Less complex than SQL  Can handle high degrees of relationship with ease • Graph schema facilitates visualization and exploration of relationships
  • 10. 10© IBM 2017 Hadoop Summit 2017 Case study - Audit data access • Data theft risk in enterprise in hybrid – Most data stolen by internal person. – Most data theft happened in non-working time. – Over-granting of privileges may cause data theft.
  • 11. 11© IBM 2017 Hadoop Summit 2017 Enterprise data quality analytics system based on graphed metadata Data ingest finance data Consumption data Credit data Behavioral data Graphed metadata … Feature Selection Statistical learning Data analysis (Graphed) Metadata analysis … Advanced Feature Selection Gradient Boosting Decision Tree Support Vector Machine Random Forests PageRank(Graph) … Modeling Customer risk rating Consumption Capacity Graph model … Recommendation Consumer behavior Fraud detection Risk analytics(Audit) …
  • 12. 12© IBM 2017 Hadoop Summit 2017 Data ingest user programData Run Read name, job id, params, config, inputs, outputs, start_ts, finish_ts, … id, name, group, permission, … name, size, location, department, permission, parent, children, … ts_hour, ts_min, ts_sec, status, … Metadata Integration Graph-based Traversal • User • Program • Data • … •Entitles  Vertices • User run program • Program read data • … Relationships  Edges • Name • …. Attributes  Properties Identify entities and relationships Metadata to Graph
  • 13. 13© IBM 2017 Hadoop Summit 2017 Feature Selection Who read the sensitive sales data in non-working time? Query: userFeaSele = graph.traversal(). V().has("department","sales").inE("read").outV().hasLabel('progra m').inE("run").has(“ts_hour",not(within(9,17))).outV() Find the user who has the access to large amount data? Query: … withComputer(SparkGraphComputer) … userAdvFeaSele = userFeaSele.pageRank().by('pageRank').order().by('pageRank').li mit(30) FeatureSelection AdvancedFeature Selection
  • 14. 14© IBM 2017 Hadoop Summit 2017 Modeling • Modeling risk analysis with graphed metadata, information in ERP. • Analyze the user with employee information from ERP, with years of working, age, role, to identify suspect. A non-sales person, for example, an application R&D person, will be the suspect. • Audit Recommendation. Risk analysis model Graph: User List(userAdvFeaSele) ERP: Employee information ERP: Violation information Audit Recommendation Risk analysis report Suspects who stole sensitive data Advanced Feature Selection Other system
  • 15. 15© IBM 2017 Hadoop Summit 2017 Agenda • Challenges in hybrid data analytics • Enterprise data quality analytics system based on graphed metadata • Graph in enterprise data quality analytics solution
  • 16. 16© IBM 2017 Hadoop Summit 2017 User data Machine data log data Behavioral data Graphed metadata Enterprise data quality system Feature analysis Lineage Metadata management Cleansing Hadoop Hbase Hive HDFS Spark Titan Solr … Data Source third-party data Ingest(load) Business Application Risk management Data audit Graph in enterprise data quality analytics solution …… Cost analytics
  • 17. 17© IBM 2017 Hadoop Summit 2017 How to choose Enterprise Graph Database? Data storing features Operation and manipulation features Graph data structures Query features Schema and instance representation Easy and centralized Management Expose service Security features Fast computing Evaluate Graph database from following perspective:
  • 18. 18© IBM 2017 Hadoop Summit 2017 Titan • What is Titan − Distributed Graph Database − Based on TinkerPop (Gremlin) − Open Source • Titan Features − Distribute − Scalable : billions edges and vertices − Real-time − Transactional database (concurrent users/ACID/..) − Global graph compute: graph data analytics, report, ETL − Search: geo, numeric range, and full text search
  • 19. 19© IBM 2017 Hadoop Summit 2017 Titan solution architecture application Management API TinkerPop API - Gremlin Internal API layer Database layer(Tx, Data, Mgmt, Optimizer) OLAPI/O Interface Storage and Index Interface Layer HBase Storage Backend Solr External Index Backend Spark Big Data Platform Gremlin GraphComputer OLAP OLTP Hadoop  Optimized for storing and querying billions of vertices and edges over a cluster  Supports thousands of concurrent users  Can execute local queries (OLTP) or distributed queries across a cluster (OLAP)
  • 20. 20© IBM 2017 Hadoop Summit 2017 Backend – HBase & Solr • HBase − Tight integration with the Hadoop ecosystem. − Native support for strong consistency. − Linear scalability with the addition of more machines. − Strictly consistent reads and writes. − Convenient base classes for backing Hadoop MapReduce jobs with HBase tables. − Support for exporting metrics via JMX. − Open source under the liberal Apache 2 license. • Solr − Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. − Solr is a standalone enterprise search server with a REST-like API. − Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more.  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation Easy and centralized Management Expose service Security features Fast computing
  • 21. 21© IBM 2017 Hadoop Summit 2017 Integration and management Titan in Ambari Titan Deployment Installation Uninstallation Titan client deployment Titan server deployment Titan server operation Start server Stop server Service check Titan Configuration HBase backend Solr backend SparkGraphComputer Titan server Titan environment Titan security Titan security support SSL SASL LDAP Kerberos Knox HBase Access control  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management Expose service Security features Fast computing
  • 22. 22© IBM 2017 Hadoop Summit 2017 Remote Titan service Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Storage and Index Interface Layer HBase Solr Spark Gremlin GraphComputer Gremlin Server Gremlin Console Titan Engine {RESTful} {Web Socket} Gremlin> local Titan server Titan client  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service Security features Fast computing
  • 23. 23© IBM 2017 Hadoop Summit 2017 Cluster Remote Titan clientTitan server Titan security enhancement Spark Gremlin Graph Computer local Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Interface Storage and Index Interface Layer HBase Solr SSL Knox SASL LDAP/OS /Kerberized Titan user HBase Access control Kerberized Cluster Security Description  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service  Security features Fast computing
  • 24. 24© IBM 2017 Hadoop Summit 2017 Integrate TinkerPop SparkGraphComputer with Titan DB Mgmt API TP API - Gremlin Internal API layer Database layer OLAPI/O Interface Storage and Index Interface Layer HBase Solr Gremlin GraphComputer Graph RDD PageRankVertexProgram PeerPressureVertexProgram BulkDumperVertexProgram BulkLoaderVertexProgram TraversalVertexProgram Spark-gremlin SparkGraphComputer Hadoop gremlin Spark  Data storing features  Operation and manipulation features  Graph data structures  Query features  Schema and instance representation  Easy and centralized Management  Expose service  Security features  Fast computing
  • 25. 25© IBM 2017 Hadoop Summit 2017 Open source Graph Database A new Linux Foundation project formed to continue development of the TitanDB graph database. Last Titan 1.0.0 was release on Sep 20 2015
  • 26. 26© IBM 2017 Hadoop Summit 2017 References & Contacts • Graph − Titan: http://titan.thinkaurelius.com − JanusGraph: http://janusgraph.org − TinkerPop: https://tinkerpop.apache.org Jun(Terry) Yang Team Leader yangjuncn@cn.ibm.com Linkedin.com/in/terryjunyang Jing Chen(Jerry) He Architect jinghe@us.ibm.com Linkedin.com/in/jing-chen-jerry-he-1553511
  • 27. 27© IBM 2017 Hadoop Summit 2017 zzzz z z z Thanks! Questions?