SlideShare une entreprise Scribd logo
1  sur  29
© 2015 IBM Corporation
Information Virtualization: Query Federation on Data Lakes
Beate Porst
porst@us.ibm.com
Product Manager Information Server
Jo Ramos
joramos@us.ibm.com
Distinguished Engineer – Big Data and Analytics @IBM
© 2015 IBM Corporation2
Agenda
 Data Lakes and Data Reservoirs
 Information Virtualization and Federation
 Examples of Federation and Best Practices
 Information Integration on Hadoop
© 2015 IBM Corporation3
The true value of Big Data is in context
Raw data
Feature
extraction metadata
Domain linkages
Full
contextual analytics
Location risk
Occupational risk
Dietary risk
Family history
Actuarial data
Government statistics
Epidemic data
Chemical exposure
Personal financial situation
Social relationships
Travel history
Weather history
. . .
. . .
Patient records
© 2015 IBM Corporation4
A growing data demand … and organizational tensions
Data Scientists seeking data for new analytics models.
Marketer seeking data for new campaigns.
Fraud investigator seeking data to understand the details of suspicious activity.
Agility
Data Access Freedom
Any kinds of data
Powerful Analysis &
Visualization
Security
Data Privacy
Standards
..
Application Developer
Knowledge Worker
Lines of Business IT Organization
© 2015 IBM Corporation5
Why a Data Reservoir and Not a Lake
 Data flows in “naturally”
and just sits there
 Built to extract value from
the data
Data Lake Data Reservoir
© 2015 IBM Corporation6
The Data Reservoir subsystems
Data Reservoir
Information Management and Governance Fabric
Data Reservoir Repositories
SandBox
Master Data Management
Cache Data
Data Marts
Operational Data Stores
Information Warehouse (EDW)
Deep Data (aka Hadoop, Aka Data Lake)
Catalogue
Self-
Service
Access
Enterprise
IT Data
Exchange
Raw Data Interaction
Analytics
Teams
Governance, Risk and
Compliance Team
Information
Curator
Line of Business
Teams
Data Reservoir
Operations
Enterprise IT
New Sources
System of
Record
Systems of
Engagement
© 2015 IBM Corporation8
Data Reservoir Logical Architecture
Data Reservoir
DataReservoir
Repositories
Harvested
Data INFORMATION
WAREHOUSE
Descriptive
Data
INFORMATION
VIEWS
CATALOG
Shared
Operational
Data
ASSET
HUB
ACTIVITY
HUB
CODE
HUB
CONTENT
HUB
Deposited
Data
Historical
Data
DEEP DATA
AUDIT DATA
OPERATIONAL
HISTORY
SEARCH
INDEX
OFFLINE
ARCHIVE
Line of Business
Applications
Information
Service Calls
Search
Requests
Report
Requests
Deploy
Decision
Models
Information
Service Calls
Data
Access
Deploy
Real-time
Decision
Models
Data Reservoir
Operations
Curation
Interaction
Management
Data
Access
Data
Deposit
Data
Deposit
Decision Model
Management
Enterprise IT
Events to
Evaluate
Information
Service Calls
Data Out
Data In
Other Systems
Of Insight
Notifications
New Sources
Third Party Feeds
Third Party APIs
Internal Sources
Deploy
Real-time
Decision
Models
Understand
Information
Sources
Understand
Information
Sources
Understand
Compliance
Report
Compliance
Advertise
Information
Source
Governance, Risk and
Compliance Team
Information
Curator
Catalog
Interfaces
Raw Data
Interaction
SAND
BOXES
Information
Integration &
Governance
INFORMATION
BROKER
OPERATIONAL
GOVERNANCE
HUB
CODE
HUB
WORKFLOWSTAGING AREAS GUARDSMONITOR
Enterprise IT
Interaction
Service
Interfaces
Data
Ingestion
Publishing
Feeds
Continuous
Analytics
STREAMING
ANALYTICS
Other Data
Reservoirs
Consumers
of Insight
Simple, ad hoc
Discovery
and Analysis
Reporting
Analytical Insight
Applications
Analytics Tools
View-based
Interaction
Access and
Feedback
Published
SAND
BOXES
REPORTING
DATA
MARTS
OBJECT
CACHE
System of
Record
Applications
Enterprise
ServiceBus
Systems of
Engagement
EVENT
CORRELATION
© 2015 IBM Corporation9
INFORMATION VIRTUALIZATION & FEDERATION
© 2015 IBM Corporation10
Information Virtualization hides the complexity of the information landscape
Information
Virtualization
Report on
Values
View related
Values
Search
Values
Browse
Sources
Analyze
Values
Provision
Information Provisioning
Information Delivery
Data Access APIs
Semantic/Business Objects
10001
01010
01010
Data Scientist
Line of
Business
© 2015 IBM Corporation11
Different Styles of Information Provisioning
Federation
Replication Caching
Consolidation
Analytical & Reporting Tools
Web Applications
Product
Performance
Real-time
Inventory Level
Consolidation
Headquarters Stores
Primary
Data Center
Backup
Data Center
Replication
Replication
Cache
Region 1
Product
Performance
Region 2
Product
Performance
Consolidation
Replication
Replication
Database
FederationFederation
© 2015 IBM Corporation12
Example – Integrating the enterprise across independent silos
ETL transforming
Data for consistency
Global View Global View
Silo 1 Silo 2 Silo 3 Silo 1 Silo 2 Silo 3
The optimal approach depends on how consistent the data is across the silos, how much spare capacity each
silo has to support additional queries and the appropriate availability of all silos to answer a global query.
Federated Queries
Consistent Data
Sources
© 2015 IBM Corporation13
Example – Creating a logical warehouse
Deep Data
(hadoop system)
System of
Record
Requested
View
Information virtualization hides the complexities of where the data is located. Here different repositories are
being used to host different workloads, but this complexity is hidden by the information virtualization layer.
Detailed data
maintained for
exploratory
analysis and
investigations.
Structured
information
optimized for
complex analytics
and reporting
?
© 2015 IBM Corporation14
Service Federation Semantic FederationDatabase Federation
Virtual Information Collection
14
1 2
Information
Federation
Process
3
• Relational Data Only
• SQL Pushdown
• Challenges:
• Query optimization
• Out-of-memory
• Complex SQL/joins
• Data is combined in-memory
Technology: SOA, Message Broker,
Spark, BI & Reporting Tools
• Challenges:
- Performance (network, memory, etc.)
• Use triple store and ontology to
create the virtualized interfaces on-
the-fly. New technology ie Spark
• Challenges:
• Query Optimization
• Security
© 2015 IBM Corporation15
IBM FEDERATION SOLUTIONS
© 2015 IBM Corporation16
BigSQL Query Fluid (federation)
 Data never lives in isolation
• Either as a landing zone or a queryable archive it is desirable to query data across Hadoop and active Data
warehouses
 Big SQL provides the ability to query heterogeneous systems
• Join Hadoop to other relational databases
• Query optimizer understands capabilities of external system
•Including available statistics
• As much work as possible is pushed to each system to process
Head Node
Big SQL
Compute Node
Task
Tracker
Data
Node
Big
SQL
Compute Node
Task
Tracker
Data
Node
Big
SQL
Compute Node
Task
Tracker
Data
Node
Big
SQL
Compute Node
Task
Tracker
Data
Node
Big
SQL
© 2015 IBM Corporation17
BigInsights (hadoop)
BIGSQL MPP Engine
Relational Engines
Relational Database
Engines
Applications
User
Interaction
BigSQL Fluid Query: Federation to RDBMS Engines
Local Data
Sources
SQL
??
Oracle
Teradata Netezza
DB2
1
7
Table-2 (local)
Table-1 (local)
Table-3 (local)
File Formats
Parquet
CSV
Seq
RC
Avro
JSON
Custom
ORC
Application
needs to join
Table-1, Table-2
and Table-3
HDFS & GPFS
© 2015 IBM Corporation18
BigInsights (hadoop)
BIGSQL MPP Engine
Federation
Engine
Relational Engines
Relational Database
Engines
Applications
User
Interaction
BigSQL Fluid Query: Federation to RDBMS Engines
Local Data
Sources
SQL
Oracle
Teradata Netezza
DB2
1
8
Table-2 (local)
Table-1 (local)
Table-3 (local)
Table-2 (alias)
Table-1 (alias)
File Formats
Parquet
CSV
Seq
RC
Avro
JSON
Custom
ORC
Application
needs to join
Table-1, Table-2
and Table-3
1. Create Alias for Table-1 and Table-2 on
BigSQL Federation Engine.
HDFS & GPFS
© 2015 IBM Corporation19
BigInsights (hadoop)
BIGSQL MPP Engine
Federation
Engine
Relational Engines
Relational Database
Engines
Applications
User
Interaction
BigSQL Fluid Query: Federation to RDBMS Engines
Local Data
Sources
SQL
• Joins, Predicates, Aggregation are pushed down to backend RDBMS engine to reduce
data transfers.
Oracle
Teradata Netezza
DB2
1
9
Table-2 (local)
Table-1 (local)
SQL
Table-3 (local)
Table-2 (alias)
Table-1 (alias)
File Formats
Parquet
CSV
Seq
RC
Avro
JSON
Custom
ORC
SQL
Application
needs to join
Table-1, Table-2
and Table-3
1. Create Alias for Table-1 and Table-2 on BigSQL
Federation Engine
2. Query Optimizer engine push part of the SQL to be
executed remote RDBMS.
3. Final Join/aggregation executed on BigSQL
HDFS & GPFS
ClientDriver
Client Driver
Data Access
Data flow
© 2015 IBM Corporation20
IBM Fluid Query V1.0
 Connectors:
• Routes PDA (Netezza) queries to the top Hadoop providers
 Data movement:
• Allows rapid data movement between PDA and Hadoop
• PDA to Hadoop
• Hadoop to PDA
 Initial Supported Hadoop SQL Query Engines
• BigInsights – Hive2, BigSQL v1, BigSQL v3, BigSQL v4
• Hortonworks – Hive2
• Cloudera – Hive2, Impala
Unifying PureData System for Analytics (PDA) with Hadoop
© 2015 IBM Corporation21
Applications
User
Interaction
PureData for Analytics
(Netezza)
Netezza Fluid Query to Hadoop Engines
NPS MPP Engine
Fluid
Query
Table-1 (alias)
Table-3 (local)
SQL SQL
Table-2 (alias)
Joins , Predicates, Aggregation are
applied on Hadoop via Views to
minimize data transfers.
Final Joins, Predicates and
aggregation are applied on Netezza.
ClientDriver
ClientDriver
Application
needs to join
Table-1, Table-2
and Table-3
2
1
Impala / Hive
BigSQL
Table-1 (local)
Table-2 (local)
SQL
Local Data
Sources
File Formats
Parquet
CSV
Seq
RC
Avro
JSON
ORC
HDFS
Data flow
© 2015 IBM Corporation22
Query Federation Best Practices
 Avoid Complex Joins Across Multiple Disparate Repositories
• Example: Join tables from BigSQL, Oracle, Teradata, Netezza on same SQL.
• Consider other techniques (copy data local, caching, etc.)
 Keep statistics current on every Table part of the Federated System
• Statistics are critical for query optimization.
 Watch out for network bandwidth and traffic
• You can overload network with large data transfers (intermediate results need to be generated)
 Consider Implement Workload Management and Query Governor
• Avoid a federated query to overload an system.
 Avoid Complex Data Transformations (in-flight transformation)
• Can impact any of the involved systems
© 2015 IBM Corporation23
When Apply Federation
 Build multi-temperature data systems
• Hot/Cold/Warm data on different repositories
 Data Dynamically changing, in particular schema evolution.
 Federated queries can perform reasonable without impact any of systems
involved
 Real-time access to small set of data on distributed systems
 When remote data can not be moved to local
• Regulatory issues
 Number of federated queries is manageable
© 2015 IBM Corporation24
Some considerations to provide access to information
Access in place
 Up-to-date information
 Cost-effective
 Slower access path
• Remote Access
• Reformatting
Make a local copy
 Specially formatted for use case
 Local data access
 Local control
 Local cost
 Potentially stale values
 Consider this questions and make the best choice
• How much information?
• How rapidly is it changing?
• How frequently is it accessed?
• How much transformation is required to consume the information?
• When is the information available?
• Who owns the information?
• How easily can it be changed?
© 2015 IBM Corporation25
IBM INFORMATION SERVER FOR HADOOP
© 2015 IBM Corporation26
The Data Reservoir subsystems
Data Reservoir
Information Management and Governance Fabric
Data Reservoir Repositories
SandBox
Master Data Management
In-Memory Cache
Data Marts
Operational Data Stores
Information Warehouse (EDW)
Deep Data (aka Hadoop, Aka Data Lake)
Catalogue
Self-
Service
Access
Enterprise
IT Data
Exchange
Raw Data Interaction
Analytics
Teams
Governance, Risk and
Compliance Team
Information
Curator
Line of Business
Teams
Data Reservoir
Operations
Enterprise IT
New Sources
System of
Record
Systems of
Engagement
© 2015 IBM Corporation27
IBM Confidential
IAP PMOM Std DCP Template – V1 May, 2015
Introducing IBM Information Server for Apache Hadoop:
Information Empowerment for Your Hadoop Environment
Superfast data ingest and processing
Integrate, prepare and enrich data with speed and confidence
running natively on Hadoop with speeds 10-15x faster
than MapReduce
Complete confidence in your data
Understand what data is available and where it came from
monitor and cleanse quality of data; catalog metadata
assets and trace lineage
Higher Level of Productivity
Develop integration processes much faster than with
hand coding – based on existing enterprise skills
graphical data flow development environment with
100s of prebuilt stages and 1000s of prebuilt functions
no other vendor
has this
scale or speed
extend existing
leadership into
hadoop domain
proven
development
paradigm
© 2015 IBM Corporation28
IBM Confidential
IAP PMOM Std DCP Template – V1 May, 2015
• Optimize your integration and DQ workload based on data locality and
resources availability
• Design your transformation or cleansing once and run it on your Hadoop
cluster, on your traditional engine or optimize to run on your database
Traditional ETL EngineDatabases
Execute “Anywhere”
One Integration & Quality Design
Maximize your IT resources utilization through “anywhere” execution
this release
adds this
pattern to run
natively on the
hadoop cluster
© 2015 IBM Corporation29
zzzz
z
z
z
Questions?
© 2015 IBM Corporation30
REFERENCE MATERIAL
New Information Architectures and Capabilities

Contenu connexe

Tendances

Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Jeffrey T. Pollock
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)plarsen67
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...NoSQLmatters
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data LakeMetroStar
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariHortonworks
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data VirtualizationKenneth Peeples
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecturemark madsen
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationKenneth Peeples
 
Open Development
Open DevelopmentOpen Development
Open DevelopmentMedsphere
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architectureMilos Milovanovic
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesDenodo
 

Tendances (20)

Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
 
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
 
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with AmbariAmbari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Big Data and Data Virtualization
Big Data and Data VirtualizationBig Data and Data Virtualization
Big Data and Data Virtualization
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architectureBuilding the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
 
Data Lake
Data LakeData Lake
Data Lake
 
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data VirtualizationBig data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
 
Open Development
Open DevelopmentOpen Development
Open Development
 
Planing and optimizing data lake architecture
Planing and optimizing data lake architecturePlaning and optimizing data lake architecture
Planing and optimizing data lake architecture
 
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data LakesBig Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
 

Similaire à Information Virtualization: Query Federation on Data Lakes

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceIBM Cloud Data Services
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data editionMark Kerzner
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter AnalyticsAdrian Turcu
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopCloudera, Inc.
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMBig Data Joe™ Rossi
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMBig Data Joe™ Rossi
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 

Similaire à Information Virtualization: Query Federation on Data Lakes (20)

Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a ServiceGet Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
 
Oil and gas big data edition
Oil and gas  big data editionOil and gas  big data edition
Oil and gas big data edition
 
IBM Smarter Analytics
IBM Smarter AnalyticsIBM Smarter Analytics
IBM Smarter Analytics
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBMOC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
 
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBMSD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Dernier (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Information Virtualization: Query Federation on Data Lakes

  • 1. © 2015 IBM Corporation Information Virtualization: Query Federation on Data Lakes Beate Porst porst@us.ibm.com Product Manager Information Server Jo Ramos joramos@us.ibm.com Distinguished Engineer – Big Data and Analytics @IBM
  • 2. © 2015 IBM Corporation2 Agenda  Data Lakes and Data Reservoirs  Information Virtualization and Federation  Examples of Federation and Best Practices  Information Integration on Hadoop
  • 3. © 2015 IBM Corporation3 The true value of Big Data is in context Raw data Feature extraction metadata Domain linkages Full contextual analytics Location risk Occupational risk Dietary risk Family history Actuarial data Government statistics Epidemic data Chemical exposure Personal financial situation Social relationships Travel history Weather history . . . . . . Patient records
  • 4. © 2015 IBM Corporation4 A growing data demand … and organizational tensions Data Scientists seeking data for new analytics models. Marketer seeking data for new campaigns. Fraud investigator seeking data to understand the details of suspicious activity. Agility Data Access Freedom Any kinds of data Powerful Analysis & Visualization Security Data Privacy Standards .. Application Developer Knowledge Worker Lines of Business IT Organization
  • 5. © 2015 IBM Corporation5 Why a Data Reservoir and Not a Lake  Data flows in “naturally” and just sits there  Built to extract value from the data Data Lake Data Reservoir
  • 6. © 2015 IBM Corporation6 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management Cache Data Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
  • 7. © 2015 IBM Corporation8 Data Reservoir Logical Architecture Data Reservoir DataReservoir Repositories Harvested Data INFORMATION WAREHOUSE Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data Historical Data DEEP DATA AUDIT DATA OPERATIONAL HISTORY SEARCH INDEX OFFLINE ARCHIVE Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise IT Events to Evaluate Information Service Calls Data Out Data In Other Systems Of Insight Notifications New Sources Third Party Feeds Third Party APIs Internal Sources Deploy Real-time Decision Models Understand Information Sources Understand Information Sources Understand Compliance Report Compliance Advertise Information Source Governance, Risk and Compliance Team Information Curator Catalog Interfaces Raw Data Interaction SAND BOXES Information Integration & Governance INFORMATION BROKER OPERATIONAL GOVERNANCE HUB CODE HUB WORKFLOWSTAGING AREAS GUARDSMONITOR Enterprise IT Interaction Service Interfaces Data Ingestion Publishing Feeds Continuous Analytics STREAMING ANALYTICS Other Data Reservoirs Consumers of Insight Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Analytics Tools View-based Interaction Access and Feedback Published SAND BOXES REPORTING DATA MARTS OBJECT CACHE System of Record Applications Enterprise ServiceBus Systems of Engagement EVENT CORRELATION
  • 8. © 2015 IBM Corporation9 INFORMATION VIRTUALIZATION & FEDERATION
  • 9. © 2015 IBM Corporation10 Information Virtualization hides the complexity of the information landscape Information Virtualization Report on Values View related Values Search Values Browse Sources Analyze Values Provision Information Provisioning Information Delivery Data Access APIs Semantic/Business Objects 10001 01010 01010 Data Scientist Line of Business
  • 10. © 2015 IBM Corporation11 Different Styles of Information Provisioning Federation Replication Caching Consolidation Analytical & Reporting Tools Web Applications Product Performance Real-time Inventory Level Consolidation Headquarters Stores Primary Data Center Backup Data Center Replication Replication Cache Region 1 Product Performance Region 2 Product Performance Consolidation Replication Replication Database FederationFederation
  • 11. © 2015 IBM Corporation12 Example – Integrating the enterprise across independent silos ETL transforming Data for consistency Global View Global View Silo 1 Silo 2 Silo 3 Silo 1 Silo 2 Silo 3 The optimal approach depends on how consistent the data is across the silos, how much spare capacity each silo has to support additional queries and the appropriate availability of all silos to answer a global query. Federated Queries Consistent Data Sources
  • 12. © 2015 IBM Corporation13 Example – Creating a logical warehouse Deep Data (hadoop system) System of Record Requested View Information virtualization hides the complexities of where the data is located. Here different repositories are being used to host different workloads, but this complexity is hidden by the information virtualization layer. Detailed data maintained for exploratory analysis and investigations. Structured information optimized for complex analytics and reporting ?
  • 13. © 2015 IBM Corporation14 Service Federation Semantic FederationDatabase Federation Virtual Information Collection 14 1 2 Information Federation Process 3 • Relational Data Only • SQL Pushdown • Challenges: • Query optimization • Out-of-memory • Complex SQL/joins • Data is combined in-memory Technology: SOA, Message Broker, Spark, BI & Reporting Tools • Challenges: - Performance (network, memory, etc.) • Use triple store and ontology to create the virtualized interfaces on- the-fly. New technology ie Spark • Challenges: • Query Optimization • Security
  • 14. © 2015 IBM Corporation15 IBM FEDERATION SOLUTIONS
  • 15. © 2015 IBM Corporation16 BigSQL Query Fluid (federation)  Data never lives in isolation • Either as a landing zone or a queryable archive it is desirable to query data across Hadoop and active Data warehouses  Big SQL provides the ability to query heterogeneous systems • Join Hadoop to other relational databases • Query optimizer understands capabilities of external system •Including available statistics • As much work as possible is pushed to each system to process Head Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL
  • 16. © 2015 IBM Corporation17 BigInsights (hadoop) BIGSQL MPP Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL ?? Oracle Teradata Netezza DB2 1 7 Table-2 (local) Table-1 (local) Table-3 (local) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 HDFS & GPFS
  • 17. © 2015 IBM Corporation18 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL Oracle Teradata Netezza DB2 1 8 Table-2 (local) Table-1 (local) Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine. HDFS & GPFS
  • 18. © 2015 IBM Corporation19 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL • Joins, Predicates, Aggregation are pushed down to backend RDBMS engine to reduce data transfers. Oracle Teradata Netezza DB2 1 9 Table-2 (local) Table-1 (local) SQL Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC SQL Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine 2. Query Optimizer engine push part of the SQL to be executed remote RDBMS. 3. Final Join/aggregation executed on BigSQL HDFS & GPFS ClientDriver Client Driver Data Access Data flow
  • 19. © 2015 IBM Corporation20 IBM Fluid Query V1.0  Connectors: • Routes PDA (Netezza) queries to the top Hadoop providers  Data movement: • Allows rapid data movement between PDA and Hadoop • PDA to Hadoop • Hadoop to PDA  Initial Supported Hadoop SQL Query Engines • BigInsights – Hive2, BigSQL v1, BigSQL v3, BigSQL v4 • Hortonworks – Hive2 • Cloudera – Hive2, Impala Unifying PureData System for Analytics (PDA) with Hadoop
  • 20. © 2015 IBM Corporation21 Applications User Interaction PureData for Analytics (Netezza) Netezza Fluid Query to Hadoop Engines NPS MPP Engine Fluid Query Table-1 (alias) Table-3 (local) SQL SQL Table-2 (alias) Joins , Predicates, Aggregation are applied on Hadoop via Views to minimize data transfers. Final Joins, Predicates and aggregation are applied on Netezza. ClientDriver ClientDriver Application needs to join Table-1, Table-2 and Table-3 2 1 Impala / Hive BigSQL Table-1 (local) Table-2 (local) SQL Local Data Sources File Formats Parquet CSV Seq RC Avro JSON ORC HDFS Data flow
  • 21. © 2015 IBM Corporation22 Query Federation Best Practices  Avoid Complex Joins Across Multiple Disparate Repositories • Example: Join tables from BigSQL, Oracle, Teradata, Netezza on same SQL. • Consider other techniques (copy data local, caching, etc.)  Keep statistics current on every Table part of the Federated System • Statistics are critical for query optimization.  Watch out for network bandwidth and traffic • You can overload network with large data transfers (intermediate results need to be generated)  Consider Implement Workload Management and Query Governor • Avoid a federated query to overload an system.  Avoid Complex Data Transformations (in-flight transformation) • Can impact any of the involved systems
  • 22. © 2015 IBM Corporation23 When Apply Federation  Build multi-temperature data systems • Hot/Cold/Warm data on different repositories  Data Dynamically changing, in particular schema evolution.  Federated queries can perform reasonable without impact any of systems involved  Real-time access to small set of data on distributed systems  When remote data can not be moved to local • Regulatory issues  Number of federated queries is manageable
  • 23. © 2015 IBM Corporation24 Some considerations to provide access to information Access in place  Up-to-date information  Cost-effective  Slower access path • Remote Access • Reformatting Make a local copy  Specially formatted for use case  Local data access  Local control  Local cost  Potentially stale values  Consider this questions and make the best choice • How much information? • How rapidly is it changing? • How frequently is it accessed? • How much transformation is required to consume the information? • When is the information available? • Who owns the information? • How easily can it be changed?
  • 24. © 2015 IBM Corporation25 IBM INFORMATION SERVER FOR HADOOP
  • 25. © 2015 IBM Corporation26 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management In-Memory Cache Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
  • 26. © 2015 IBM Corporation27 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 Introducing IBM Information Server for Apache Hadoop: Information Empowerment for Your Hadoop Environment Superfast data ingest and processing Integrate, prepare and enrich data with speed and confidence running natively on Hadoop with speeds 10-15x faster than MapReduce Complete confidence in your data Understand what data is available and where it came from monitor and cleanse quality of data; catalog metadata assets and trace lineage Higher Level of Productivity Develop integration processes much faster than with hand coding – based on existing enterprise skills graphical data flow development environment with 100s of prebuilt stages and 1000s of prebuilt functions no other vendor has this scale or speed extend existing leadership into hadoop domain proven development paradigm
  • 27. © 2015 IBM Corporation28 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 • Optimize your integration and DQ workload based on data locality and resources availability • Design your transformation or cleansing once and run it on your Hadoop cluster, on your traditional engine or optimize to run on your database Traditional ETL EngineDatabases Execute “Anywhere” One Integration & Quality Design Maximize your IT resources utilization through “anywhere” execution this release adds this pattern to run natively on the hadoop cluster
  • 28. © 2015 IBM Corporation29 zzzz z z z Questions?
  • 29. © 2015 IBM Corporation30 REFERENCE MATERIAL New Information Architectures and Capabilities