Soumettre la recherche
Mettre en ligne
Information Virtualization: Query Federation on Data Lakes
•
20 j'aime
•
8,087 vues
DataWorks Summit
Suivre
Hadoop Summit 2015
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 29
Recommandé
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
Datalake Architecture
Datalake Architecture
TechYugadi IT Solutions & Consulting
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
prajods
Data Mesh
Data Mesh
Piethein Strengholt
IBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banques
Rodolphe Lezennec
Recommandé
10 Amazing Things To Do With a Hadoop-Based Data Lake
10 Amazing Things To Do With a Hadoop-Based Data Lake
VMware Tanzu
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
Denodo
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
Verizon: Finance Data Lake implementation as a Self Service Discovery Big Dat...
DataWorks Summit
Datalake Architecture
Datalake Architecture
TechYugadi IT Solutions & Consulting
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Big Data Day LA 2015 - Data Lake - Re Birth of Enterprise Data Thinking by Ra...
Data Con LA
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
Enabling Data as a Service with the JBoss Enterprise Data Services Platform
prajods
Data Mesh
Data Mesh
Piethein Strengholt
IBM - Transformation digitale et le SI des banques
IBM - Transformation digitale et le SI des banques
Rodolphe Lezennec
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
plarsen67
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
sambiswal
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
Data lake benefits
Data lake benefits
Ricky Barron
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
Big Data and Data Virtualization
Big Data and Data Virtualization
Kenneth Peeples
Big data architectures and the data lake
Big data architectures and the data lake
James Serra
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
mark madsen
Data Lake
Data Lake
Anitha Krishnappa
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
Kenneth Peeples
Open Development
Open Development
Medsphere
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
Tapdata Product Intro
Tapdata Product Intro
Tapdata
Contenu connexe
Tendances
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Jeffrey T. Pollock
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
Jeffrey T. Pollock
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
Cloudera, Inc.
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
plarsen67
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
NoSQLmatters
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Zaloni
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
sambiswal
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Zaloni
Data lake benefits
Data lake benefits
Ricky Barron
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
MetroStar
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Hortonworks
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Caserta
Big Data and Data Virtualization
Big Data and Data Virtualization
Kenneth Peeples
Big data architectures and the data lake
Big data architectures and the data lake
James Serra
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
mark madsen
Data Lake
Data Lake
Anitha Krishnappa
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
Kenneth Peeples
Open Development
Open Development
Medsphere
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Milos Milovanovic
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Denodo
Tendances
(20)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
Unlocking Big Data Silos in the Enterprise or the Cloud (Con7877)
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
JBoss Enterprise Data Services (Data Virtualization)
JBoss Enterprise Data Services (Data Virtualization)
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Alexandre Vasseur - Evolution of Data Architectures: From Hadoop to Data Lake...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
Data lake benefits
Data lake benefits
5 Steps for Architecting a Data Lake
5 Steps for Architecting a Data Lake
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Ambari Meetup: 2nd April 2013: Teradata Viewpoint Hadoop Integration with Ambari
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
Big Data and Data Virtualization
Big Data and Data Virtualization
Big data architectures and the data lake
Big data architectures and the data lake
Building the Enterprise Data Lake: A look at architecture
Building the Enterprise Data Lake: A look at architecture
Data Lake
Data Lake
Big data insights with Red Hat JBoss Data Virtualization
Big data insights with Red Hat JBoss Data Virtualization
Open Development
Open Development
Planing and optimizing data lake architecture
Planing and optimizing data lake architecture
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Big Data: Architecture and Performance Considerations in Logical Data Lakes
Similaire à Information Virtualization: Query Federation on Data Lakes
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Vikas Manoria
Tapdata Product Intro
Tapdata Product Intro
Tapdata
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Denodo
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
VMware Tanzu
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
IBM Cloud Data Services
Oil and gas big data edition
Oil and gas big data edition
Mark Kerzner
IBM Smarter Analytics
IBM Smarter Analytics
Adrian Turcu
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
confluent
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
Pat O'Sullivan
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
Jeffrey T. Pollock
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
Big Data Joe™ Rossi
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Big Data Joe™ Rossi
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
DataWorks Summit
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Jonathan Raspaud
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Denodo
Similaire à Information Virtualization: Query Federation on Data Lakes
(20)
Overview - IBM Big Data Platform
Overview - IBM Big Data Platform
Tapdata Product Intro
Tapdata Product Intro
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
Get Started Quickly with IBM's Hadoop as a Service
Get Started Quickly with IBM's Hadoop as a Service
Oil and gas big data edition
Oil and gas big data edition
IBM Smarter Analytics
IBM Smarter Analytics
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
OC Big Data Monthly Meetup #6 - Session 1 - IBM
OC Big Data Monthly Meetup #6 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
SD Big Data Monthly Meetup #4 - Session 1 - IBM
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Plus de DataWorks Summit
Data Science Crash Course
Data Science Crash Course
DataWorks Summit
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
Managing the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
Plus de DataWorks Summit
(20)
Data Science Crash Course
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Dernier
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
apidays
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Boston Institute of Analytics
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
sammart93
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Product Anonymous
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
apidays
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
The Digital Insurer
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
wesley chun
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
DianaGray10
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Drew Madelung
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
The Digital Insurer
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Neo4j
Dernier
(20)
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Information Virtualization: Query Federation on Data Lakes
1.
© 2015 IBM
Corporation Information Virtualization: Query Federation on Data Lakes Beate Porst porst@us.ibm.com Product Manager Information Server Jo Ramos joramos@us.ibm.com Distinguished Engineer – Big Data and Analytics @IBM
2.
© 2015 IBM
Corporation2 Agenda Data Lakes and Data Reservoirs Information Virtualization and Federation Examples of Federation and Best Practices Information Integration on Hadoop
3.
© 2015 IBM
Corporation3 The true value of Big Data is in context Raw data Feature extraction metadata Domain linkages Full contextual analytics Location risk Occupational risk Dietary risk Family history Actuarial data Government statistics Epidemic data Chemical exposure Personal financial situation Social relationships Travel history Weather history . . . . . . Patient records
4.
© 2015 IBM
Corporation4 A growing data demand … and organizational tensions Data Scientists seeking data for new analytics models. Marketer seeking data for new campaigns. Fraud investigator seeking data to understand the details of suspicious activity. Agility Data Access Freedom Any kinds of data Powerful Analysis & Visualization Security Data Privacy Standards .. Application Developer Knowledge Worker Lines of Business IT Organization
5.
© 2015 IBM
Corporation5 Why a Data Reservoir and Not a Lake Data flows in “naturally” and just sits there Built to extract value from the data Data Lake Data Reservoir
6.
© 2015 IBM
Corporation6 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management Cache Data Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
7.
© 2015 IBM
Corporation8 Data Reservoir Logical Architecture Data Reservoir DataReservoir Repositories Harvested Data INFORMATION WAREHOUSE Descriptive Data INFORMATION VIEWS CATALOG Shared Operational Data ASSET HUB ACTIVITY HUB CODE HUB CONTENT HUB Deposited Data Historical Data DEEP DATA AUDIT DATA OPERATIONAL HISTORY SEARCH INDEX OFFLINE ARCHIVE Line of Business Applications Information Service Calls Search Requests Report Requests Deploy Decision Models Information Service Calls Data Access Deploy Real-time Decision Models Data Reservoir Operations Curation Interaction Management Data Access Data Deposit Data Deposit Decision Model Management Enterprise IT Events to Evaluate Information Service Calls Data Out Data In Other Systems Of Insight Notifications New Sources Third Party Feeds Third Party APIs Internal Sources Deploy Real-time Decision Models Understand Information Sources Understand Information Sources Understand Compliance Report Compliance Advertise Information Source Governance, Risk and Compliance Team Information Curator Catalog Interfaces Raw Data Interaction SAND BOXES Information Integration & Governance INFORMATION BROKER OPERATIONAL GOVERNANCE HUB CODE HUB WORKFLOWSTAGING AREAS GUARDSMONITOR Enterprise IT Interaction Service Interfaces Data Ingestion Publishing Feeds Continuous Analytics STREAMING ANALYTICS Other Data Reservoirs Consumers of Insight Simple, ad hoc Discovery and Analysis Reporting Analytical Insight Applications Analytics Tools View-based Interaction Access and Feedback Published SAND BOXES REPORTING DATA MARTS OBJECT CACHE System of Record Applications Enterprise ServiceBus Systems of Engagement EVENT CORRELATION
8.
© 2015 IBM
Corporation9 INFORMATION VIRTUALIZATION & FEDERATION
9.
© 2015 IBM
Corporation10 Information Virtualization hides the complexity of the information landscape Information Virtualization Report on Values View related Values Search Values Browse Sources Analyze Values Provision Information Provisioning Information Delivery Data Access APIs Semantic/Business Objects 10001 01010 01010 Data Scientist Line of Business
10.
© 2015 IBM
Corporation11 Different Styles of Information Provisioning Federation Replication Caching Consolidation Analytical & Reporting Tools Web Applications Product Performance Real-time Inventory Level Consolidation Headquarters Stores Primary Data Center Backup Data Center Replication Replication Cache Region 1 Product Performance Region 2 Product Performance Consolidation Replication Replication Database FederationFederation
11.
© 2015 IBM
Corporation12 Example – Integrating the enterprise across independent silos ETL transforming Data for consistency Global View Global View Silo 1 Silo 2 Silo 3 Silo 1 Silo 2 Silo 3 The optimal approach depends on how consistent the data is across the silos, how much spare capacity each silo has to support additional queries and the appropriate availability of all silos to answer a global query. Federated Queries Consistent Data Sources
12.
© 2015 IBM
Corporation13 Example – Creating a logical warehouse Deep Data (hadoop system) System of Record Requested View Information virtualization hides the complexities of where the data is located. Here different repositories are being used to host different workloads, but this complexity is hidden by the information virtualization layer. Detailed data maintained for exploratory analysis and investigations. Structured information optimized for complex analytics and reporting ?
13.
© 2015 IBM
Corporation14 Service Federation Semantic FederationDatabase Federation Virtual Information Collection 14 1 2 Information Federation Process 3 • Relational Data Only • SQL Pushdown • Challenges: • Query optimization • Out-of-memory • Complex SQL/joins • Data is combined in-memory Technology: SOA, Message Broker, Spark, BI & Reporting Tools • Challenges: - Performance (network, memory, etc.) • Use triple store and ontology to create the virtualized interfaces on- the-fly. New technology ie Spark • Challenges: • Query Optimization • Security
14.
© 2015 IBM
Corporation15 IBM FEDERATION SOLUTIONS
15.
© 2015 IBM
Corporation16 BigSQL Query Fluid (federation) Data never lives in isolation • Either as a landing zone or a queryable archive it is desirable to query data across Hadoop and active Data warehouses Big SQL provides the ability to query heterogeneous systems • Join Hadoop to other relational databases • Query optimizer understands capabilities of external system •Including available statistics • As much work as possible is pushed to each system to process Head Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL Compute Node Task Tracker Data Node Big SQL
16.
© 2015 IBM
Corporation17 BigInsights (hadoop) BIGSQL MPP Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL ?? Oracle Teradata Netezza DB2 1 7 Table-2 (local) Table-1 (local) Table-3 (local) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 HDFS & GPFS
17.
© 2015 IBM
Corporation18 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL Oracle Teradata Netezza DB2 1 8 Table-2 (local) Table-1 (local) Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine. HDFS & GPFS
18.
© 2015 IBM
Corporation19 BigInsights (hadoop) BIGSQL MPP Engine Federation Engine Relational Engines Relational Database Engines Applications User Interaction BigSQL Fluid Query: Federation to RDBMS Engines Local Data Sources SQL • Joins, Predicates, Aggregation are pushed down to backend RDBMS engine to reduce data transfers. Oracle Teradata Netezza DB2 1 9 Table-2 (local) Table-1 (local) SQL Table-3 (local) Table-2 (alias) Table-1 (alias) File Formats Parquet CSV Seq RC Avro JSON Custom ORC SQL Application needs to join Table-1, Table-2 and Table-3 1. Create Alias for Table-1 and Table-2 on BigSQL Federation Engine 2. Query Optimizer engine push part of the SQL to be executed remote RDBMS. 3. Final Join/aggregation executed on BigSQL HDFS & GPFS ClientDriver Client Driver Data Access Data flow
19.
© 2015 IBM
Corporation20 IBM Fluid Query V1.0 Connectors: • Routes PDA (Netezza) queries to the top Hadoop providers Data movement: • Allows rapid data movement between PDA and Hadoop • PDA to Hadoop • Hadoop to PDA Initial Supported Hadoop SQL Query Engines • BigInsights – Hive2, BigSQL v1, BigSQL v3, BigSQL v4 • Hortonworks – Hive2 • Cloudera – Hive2, Impala Unifying PureData System for Analytics (PDA) with Hadoop
20.
© 2015 IBM
Corporation21 Applications User Interaction PureData for Analytics (Netezza) Netezza Fluid Query to Hadoop Engines NPS MPP Engine Fluid Query Table-1 (alias) Table-3 (local) SQL SQL Table-2 (alias) Joins , Predicates, Aggregation are applied on Hadoop via Views to minimize data transfers. Final Joins, Predicates and aggregation are applied on Netezza. ClientDriver ClientDriver Application needs to join Table-1, Table-2 and Table-3 2 1 Impala / Hive BigSQL Table-1 (local) Table-2 (local) SQL Local Data Sources File Formats Parquet CSV Seq RC Avro JSON ORC HDFS Data flow
21.
© 2015 IBM
Corporation22 Query Federation Best Practices Avoid Complex Joins Across Multiple Disparate Repositories • Example: Join tables from BigSQL, Oracle, Teradata, Netezza on same SQL. • Consider other techniques (copy data local, caching, etc.) Keep statistics current on every Table part of the Federated System • Statistics are critical for query optimization. Watch out for network bandwidth and traffic • You can overload network with large data transfers (intermediate results need to be generated) Consider Implement Workload Management and Query Governor • Avoid a federated query to overload an system. Avoid Complex Data Transformations (in-flight transformation) • Can impact any of the involved systems
22.
© 2015 IBM
Corporation23 When Apply Federation Build multi-temperature data systems • Hot/Cold/Warm data on different repositories Data Dynamically changing, in particular schema evolution. Federated queries can perform reasonable without impact any of systems involved Real-time access to small set of data on distributed systems When remote data can not be moved to local • Regulatory issues Number of federated queries is manageable
23.
© 2015 IBM
Corporation24 Some considerations to provide access to information Access in place Up-to-date information Cost-effective Slower access path • Remote Access • Reformatting Make a local copy Specially formatted for use case Local data access Local control Local cost Potentially stale values Consider this questions and make the best choice • How much information? • How rapidly is it changing? • How frequently is it accessed? • How much transformation is required to consume the information? • When is the information available? • Who owns the information? • How easily can it be changed?
24.
© 2015 IBM
Corporation25 IBM INFORMATION SERVER FOR HADOOP
25.
© 2015 IBM
Corporation26 The Data Reservoir subsystems Data Reservoir Information Management and Governance Fabric Data Reservoir Repositories SandBox Master Data Management In-Memory Cache Data Marts Operational Data Stores Information Warehouse (EDW) Deep Data (aka Hadoop, Aka Data Lake) Catalogue Self- Service Access Enterprise IT Data Exchange Raw Data Interaction Analytics Teams Governance, Risk and Compliance Team Information Curator Line of Business Teams Data Reservoir Operations Enterprise IT New Sources System of Record Systems of Engagement
26.
© 2015 IBM
Corporation27 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 Introducing IBM Information Server for Apache Hadoop: Information Empowerment for Your Hadoop Environment Superfast data ingest and processing Integrate, prepare and enrich data with speed and confidence running natively on Hadoop with speeds 10-15x faster than MapReduce Complete confidence in your data Understand what data is available and where it came from monitor and cleanse quality of data; catalog metadata assets and trace lineage Higher Level of Productivity Develop integration processes much faster than with hand coding – based on existing enterprise skills graphical data flow development environment with 100s of prebuilt stages and 1000s of prebuilt functions no other vendor has this scale or speed extend existing leadership into hadoop domain proven development paradigm
27.
© 2015 IBM
Corporation28 IBM Confidential IAP PMOM Std DCP Template – V1 May, 2015 • Optimize your integration and DQ workload based on data locality and resources availability • Design your transformation or cleansing once and run it on your Hadoop cluster, on your traditional engine or optimize to run on your database Traditional ETL EngineDatabases Execute “Anywhere” One Integration & Quality Design Maximize your IT resources utilization through “anywhere” execution this release adds this pattern to run natively on the hadoop cluster
28.
© 2015 IBM
Corporation29 zzzz z z z Questions?
29.
© 2015 IBM
Corporation30 REFERENCE MATERIAL New Information Architectures and Capabilities