SlideShare a Scribd company logo
1 of 70
Informix Warehouse &  Informix Warehouse Accelerator Overview  ,[object Object],[object Object]
Disclaimer ,[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Warehousing  Industry Trends
State of Data Warehousing in 2011 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
State of Data Warehousing, Cont’d ,[object Object],[object Object],[object Object],[object Object]
State of Data Warehouse, Cont’d  ,[object Object],[object Object],[object Object]
Data Warehouse Trends for the CIO, 2011-2012 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informix Warehouse History ,[object Object],[object Object],[object Object],[object Object]
Existing IDS Warehousing Features ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informix Warehousing Moving Forward ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informix Warehouse Roadmap  ,[object Object],[object Object],[object Object],[object Object],Informix Warehouse with Storage Optimization/Compression ,[object Object],[object Object],[object Object],External Tables Star Join Optimization Multi-index Scan New Fragmentation Fragment Level Stats Storage Provisioning Warehouse Accelerator
Informix Warehouse 11.70 Features
Typical Data Warehouse Architecture
Source: Forrester Query Tools Analytics BPS Apps BI Apps LOB apps Databases Other transactional data sources I/O & data loading Query processing DBMS & Storage mgmt 11.70 Warehousing Features Data Loading HPL  DB utilities ON utilities DataStage External Tables Online attach/detach Data & Storage Management Deep Compression Interval and List Fragmentation Online attach/detach Fragment level stats Storage provisioning Table defragmenter Query Processing Light Scans Merge Hierarchical Queries Multi-Index Scan Skip Scan Bitmap Technology Star and Snowflake join optimization Implicit PDQ Access performance
Informix Warehouse Tooling - SQW Execution DEPLOY Deployment  preparation Deploy RUNTIME HTTP service ( WAS  ) SQW Runtime Applications Other Servers (DataStage) DB2   Oracle   SQL Server   Design Studio Admin Console Deploy Data Source Databases Execution Execution Debug SQW Control DB   IDS   DESIGN Design Center (Eclipse) Data Flows + Control Flows  Deployment  package Code Units Build Profile User scripts Warehouse DB IDS SQW Execution DB   IDS
SQW: Design Studio ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SQW: Data Modeling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SQW: Data Flows ,[object Object],[object Object],[object Object],[object Object],File source Table source Table join aggregation Table target
SQW: Data Flows A simple flow ,[object Object],[object Object],[object Object],[object Object]
SQW: Control Flows ,[object Object],[object Object],[object Object],[object Object]
SQW Overview Design Studio Eclipse Based Design Environment Admin Console Production Environment in Websphere deploy ,[object Object],[object Object],[object Object],create ,[object Object],[object Object],[object Object],manage
Admin Console ,[object Object],[object Object],[object Object]
Informix 11.70 Feature: Warehouse Time-Cyclic Data Management ,[object Object],[object Object],[object Object],[object Object],[object Object],field field field field field field field Jan Feb Mar Apr May 2011 Dec 2010 enables storing data   over time field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field field
Interval Fragmentation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informix 11.70 Feature:  Multi-Index Scan ,[object Object],[object Object],[object Object],[object Object]
Multi-Index Scan – An Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Multi-Index Scan Example ,[object Object],[object Object],[object Object],[object Object],Method retrieves rows based on the most selective constraint using only the index for that column, followed by a sequential evaluation of each of other constraints in a post-retrieval manner.
Multi-Index Scan Example ,[object Object],[object Object],[object Object],[object Object],Gender=‘m’ Zipcode=‘95032’ AND Records Sorted RIDs Income_Category=“high” Education_level = “masters” Sequential Skip Scan
Informix 11.70 Feature: Push Down Hash Join ,[object Object],[object Object],[object Object],[object Object],[object Object],Build Scan Hash Join   Build Probe Probe Scan
Typical Star Schema: An Example ,[object Object],[object Object],[object Object],[object Object],[object Object],Dim (D1) Dim (D3) Fact (F) 1M rows sel : 1/1000 10K rows sel : 1/10 10K rows sel : 1/10 10K rows sel: 1/10 Dim (D2)
Prior to 11.70: Standard Left Deep Tree Solution  Scan D1 1K 1M Problem Join Second Join Build Too Large Scan F Hash Join  Hash Join  Scan D3 Hash Join  Scan D2 100K  1K 10K 1K
11.70 Feature: Pushdown Hash-Join Solution Scan F Scan D1 1K Scan D3 Scan D2 1K Join Keys Multi Index Scan of Fact Table using Join Keys and Single-Column Indexes Join Keys Pushed Down to Reduce Probe Size Hash Join  Hash Join  Hash Join  1K 1K 1K 1K
Informix Warehouse Accelerator (IWA)
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Third Generation of Database Technology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example of 2nd Generation Database Disk I/O Issue
How Oracle/Exadata Solves That Problem: Add an I/O Layer
Sun Oracle Database Machine Full Rack ,[object Object],[object Object],[object Object],14 Exadata Storage Cells (Storage Server) per Cell up to 1.5 GB/Sec I/O Bandwidth => 21 GB/Sec per DB machine 8 Oracle RAC Database Servers  InfiniBand Switches/Network InfiniBand 16 Gigabit per Channel 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 24 GB Memory 12 Disks (600 GB/2 TB) 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory 8 Cores 72 GB Memory
Cost of Oracle/Exadata Solution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Informix Warehouse Accelerator   3 rd  Generation Database Technology is Here ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],What is it?   The Informix Warehouse Accelerator (IWA) is a workload optimized, appliance-like, add-on, that enables the integration of business insights into operational processes to drive winning strategies. It accelerates select queries, with unprecedented response times. Breakthrough Technology Enabling New Opportunities
Breakthrough technologies for performance Row & Columnar Database Row format within IDS for transactional workloads and columnar data access via accelerator for OLAP queries. Extreme Compression Required because RAM is the limiting factor. Massive Parallelism All cores are used within used for queries Predicate evaluation on compressed data Often scans w/o decompression during evaluation Frequency Partitioning Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination. In Memory Database 3 rd  generation database technology avoids I/O. Compression allows huge databases to be completely memory resident Multi-core and Vector Optimized Algorithms   Avoiding locking or synchronization 1 2 3 4 5 6 7 1 2 3 4 5 6 7
TCP/IP Informix Warehouse Accelerator Configuration ,[object Object],[object Object],[object Object],[object Object],[object Object],Bulk Loader SQL Queries (from apps) Informix Warehouse Accelerator Compressed  DB partition Query Processor Data Warehouse IDS SQL  (via DRDA) Query Router ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Results
Informix Warehouse Accelerator Overview Coordinator  Process  Orchestrating the distributed tasks like Load or Query execution .  Have all the data in main memory spread across all cores. Do the compression and query execution.  IDS Query parsing and matching to the Optimizer. Routing query blocks. . . Worker  Processes
Target Market:  Business Intelligence (BI) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Dimensions Fact Table City Region Store SALES Product Period Brand Month Quarter Category
What IWA is Designed For ,[object Object],[object Object],[object Object],SELECT PRODUCT_DEPARTMENT, REGION,  SUM(REVENUE) FROM FACT_SALES F INNER JOIN DIM_PRODUCT P ON F.FKP = P.PK INNER JOIN DIM_REGION R ON F.FKR = R.PK LEFT OUTER JOIN DIM_TIME T ON F.FKT = T.PK WHERE T.YEAR  = 2009 AND R.GEOID  = 17 AND P.TYPEID = 3 GROUP BY PRODUCT_DEPARTMENT, REGION
Case Study #1: Major U.S. Shoe Retailer  ,[object Object],Our Retail users will be really happy to see such a huge improvement in the queries processing timings.  This IWA extension to IDS will really bring value to the Retail BI environment. 2 secs 45 mins & up 7 2 secs 30 mins 6 2 secs 2 mins 5 4 secs 30 mins & up 4 2 secs 3 mins 40 secs 3 2 secs 1 min 3 secs 2 4 secs 22 mins 1 IDS 11.7 IWA IDS 11.5 Query
Case Study #2: Datamart at a Government Agency ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Case Study #3: U.S. Government Agency             15800.89% Fact Table Scan 0:00:41 1:48:58 Summarize all transactions by State, County, City, State, Zip, Program, Program Year, Commodity and Fiscal Year 5 108.41% Index Read 0:00:06 0:00:06 Detailed Report on Specific Programs in a Date Range 4 41708.49% Fact Table Scan 0:00:14 1:34:37 Summarize all transactions by State and County 3 7640.45% Fact Table Scan 0:01:05 1:22:32 Find Top 100 Members 2 6023.23% Fact Table Scan 0:01:28 1:28:22 Find Top 100 Entities 1 Improvement Notes Informix w/ IWA  Informix Description Query
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Row Oriented Data Store Each row stored sequentially   ,[object Object],[object Object],[object Object],[object Object],[object Object],If only few columns are required the complete row is still fetched and uncompressed
Columnar Data Store  Data is stored sequentially by column If attributes are not required for a specific query execution, they are skipped completely. ,[object Object],[object Object],[object Object]
Compression: Frequency Partitioning Top 64  traded goods  – 6 bit code Rest Prod Origin Trade Info (volume, product,    origin country) Histogram on  Origin Histogram  on  Product Origin Product China USA GER, FRA, … Rest Table partitioned  into Cells Column Partitions Vol ,[object Object],[object Object],[object Object],Cell 4 Cell 1 Cell 2 Cell 3 Cell 5 Cell 6 Common Values Rare  values Number of  Occurrences
Compression Process: Step 1 Male/John Input tuple Column 1 Column 2 Co-code transform Type specific transform Column 1 & 2 Column 3.A Column Code TupleCode Column Code Column 3 Column 3.B Column Code Male/John/Sat Sat 2006 Male, John, 08/10/06, Mango 101101011 001 01011101 101101011 001 01011101 p = 1/512 p = 1/8 p = 1/512 w35/Mango w35 Huffman Encode Dict Huffman Encode Dict Huffman Encode Dict Male John 08/10/06 Mango 1.5% Steven 1.9% Thomas 2.3% Richard 2.4% Mark 2.5% William 3.5% John 3.5% Robert 3.6% James 3.8% David 4.2% Michael 22% 28% 17% 15% 9% 5% 4% Female 12% 42% 23% 6% 10% 4% 3% Male Sun Sat Fri Thu Wed Tue Mon
Compression Process: Step 2 First tuple code Tuplecode — Sorted Tuplecodes 1 Previous Tuplecode Delta Huffman Encode Delta Code Append Dict Compression Block 101101011100001100 10110101110001011111 1011010111000011101 10110101110001011101 10110101110001011101 0000000000000000001 000 000 00000000000000000001 010 010 0000000000000000101 1110 1110 Look Ma, no delimiters! 101101011100010111010000101110 — — —
Data is Processed in Compressed Format ,[object Object],[object Object],[object Object],[object Object],The  Register – Store  is an optimization of the Column – Store approach where we try to make the best use of existing hardware. Reshuffeling small data elements at runtime into a register is time consuming and can be avoided. The  Register – Store  also delivers good vectorization capabilities. Predicate evaluation is done against compressed data!
Register Stores Facilitate SIMD Parallelism ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],32 bits 32 bits 32 bits 32 bits 128 bits Vector Operation A 1 D 1 G 1 A 2 D 2 G 2 A 4 D 4 G 4 Bank  β 1 (32 bits) A 3 D 3 G 3 B 1 E 1 F 1 B 2 E 2 F 2 B 4 E 4 F 4 C 1 H 1 C 3 H 3 C 4 H 4 Bank  β 2 (32 bits) Bank  β 3  (16 bits) Cell Block B 3 E 3 F 3 C 2 H 2 Result 1 Result 2 Result 3 Result 4 Operand Operand Operand Operand
Simultaneous Evaluation of Equality Predicates State==‘CA’ && Quarter == ‘Q4’ State==01001 && Quarter==1110 Translate value query to Code query Row Mask Selection result … … … … 01001 0 1110 0 == & ,[object Object],[object Object],[object Object],[object Object],State Quarter 11111 0 1111 0
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Defining, What Data to Accelerate ,[object Object],[object Object],[object Object],[object Object],Define Worker Processes Coordinator Process IDS + IWA
IWA Design Studio
Distributing data from IDS  (Fact tables) Data Fragment Fact Table UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedures Copy A copy of the IDS data is now transferred over to the Worker process. The Worker process holds a subset of the data (compressed) in main memory and is able to execute queries on this subset. The data is evenly distributed (no value based partitioning) across the cpus. Coordinator Process Worker Process Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Worker Process Worker Process Data Fragment Data Fragment Data Fragment
Distributing data from IDS  (Dimension tables) IDS UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedure All dimension tables are transferred to the worker process.  Coordinator Process Worker Process Worker Process Worker Process Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table
Mapping Data from IDS to IWA Inside IWA Inside IDS Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table Compressed Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
IWA Referenced Hardware Configuration Options: 300 GB SAS hard disk drives each 6 disks 512G Memory  X7560 @ 2.27GH 4 X 8 Intel(R) Xeon(R) CPU - 16x 1.8" SAS SSDs with eXFlash or 8x 2.5" SAS HDDs - Optional MAX5 32-DIMM memory expansion - Scalable from 4 sockets and 64 DIMMs to 8 sockets and 128 DIMMs - 8-core, 6-core and 4-core processor options with up to 2.26 GHz (8-core), 2.66 GHz (six-core) and 1.86 GHz (four-core) speeds with up to 16 MB L3 cache - 4-processor, 4U rack-optimized enterprise server with Intel® Xeon® processors.
IWA Software Components ,[object Object],[object Object],[object Object],[object Object]
(Fred Ho – hof@us.ibm.com)
 

More Related Content

What's hot

Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesWalaa Hamdy Assy
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouserpolat
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門Akira Shimosako
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodLudovico Caldara
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...Altinity Ltd
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTOClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAAchieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAMarkus Michalewicz
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEOAltinity Ltd
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to sparkHome
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks
 

What's hot (20)

Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & librariesApache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門CLUB DB2 第122回  DB2管理本の著者が教える 簡単運用管理入門
CLUB DB2 第122回 DB2管理本の著者が教える 簡単運用管理入門
 
2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
Oracle Client Failover - Under The Hood
Oracle Client Failover - Under The HoodOracle Client Failover - Under The Hood
Oracle Client Failover - Under The Hood
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTOClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction, by Alexander Zaitsev, Altinity CTO
 
Apache Spark & Hadoop
Apache Spark & HadoopApache Spark & Hadoop
Apache Spark & Hadoop
 
Achieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAAAchieving Continuous Availability for Your Applications with Oracle MAA
Achieving Continuous Availability for Your Applications with Oracle MAA
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEODangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
Dangerous on ClickHouse in 30 minutes, by Robert Hodges, Altinity CEO
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks Technical Workshop: Interactive Query with Apache Hive
 

Similar to Informix warehouse and accelerator overview

Ibm Cognos B Iund Pmfj
Ibm Cognos B Iund PmfjIbm Cognos B Iund Pmfj
Ibm Cognos B Iund PmfjFriedel Jonker
 
SQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data WarehouseSQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data Warehouserobinson_adams
 
Lecture about SAP HANA and Enterprise Comupting at University of Halle
Lecture about SAP HANA and Enterprise Comupting at University of HalleLecture about SAP HANA and Enterprise Comupting at University of Halle
Lecture about SAP HANA and Enterprise Comupting at University of HalleTobias Trapp
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primerpartha69
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptCanara bank
 
Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentationvickyc
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virendervithakur
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDatawarehouse Trainings
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016StampedeCon
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseAltibase
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
 
Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional PortfolioMoniqueO Opris
 
AX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy VliegenAX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy Vliegendynamicscom
 

Similar to Informix warehouse and accelerator overview (20)

DWBASIC.ppt
DWBASIC.pptDWBASIC.ppt
DWBASIC.ppt
 
Ibm Cognos B Iund Pmfj
Ibm Cognos B Iund PmfjIbm Cognos B Iund Pmfj
Ibm Cognos B Iund Pmfj
 
SQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data WarehouseSQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data Warehouse
 
Lecture about SAP HANA and Enterprise Comupting at University of Halle
Lecture about SAP HANA and Enterprise Comupting at University of HalleLecture about SAP HANA and Enterprise Comupting at University of Halle
Lecture about SAP HANA and Enterprise Comupting at University of Halle
 
Open Source DWBI-A Primer
Open Source DWBI-A PrimerOpen Source DWBI-A Primer
Open Source DWBI-A Primer
 
OBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.pptOBIEE ARCHITECTURE.ppt
OBIEE ARCHITECTURE.ppt
 
Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentation
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Big SQL NYC Event December by Virender
Big SQL NYC Event December by VirenderBig SQL NYC Event December by Virender
Big SQL NYC Event December by Virender
 
IBM FlashSystem in OLAP Database Environments
IBM FlashSystem in OLAP Database EnvironmentsIBM FlashSystem in OLAP Database Environments
IBM FlashSystem in OLAP Database Environments
 
Dwh basics datastage online training
Dwh basics datastage online trainingDwh basics datastage online training
Dwh basics datastage online training
 
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAININGDATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
DATASTAGE AND QUALITY STAGE 9.1 ONLINE TRAINING
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data mining
Data miningData mining
Data mining
 
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
Best Practices For Building and Operating A Managed Data Lake - StampedeCon 2016
 
The Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- AltibaseThe Most Trusted In-Memory database in the world- Altibase
The Most Trusted In-Memory database in the world- Altibase
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Professional Portfolio
Professional PortfolioProfessional Portfolio
Professional Portfolio
 
AX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy VliegenAX2012 Technical Track - Infrastructure, Davy Vliegen
AX2012 Technical Track - Infrastructure, Davy Vliegen
 

More from Keshav Murthy

N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0Keshav Murthy
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...Keshav Murthy
 
Couchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresCouchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresKeshav Murthy
 
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliN1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliKeshav Murthy
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Keshav Murthy
 
Couchbase Query Workbench Enhancements By Eben Haber
Couchbase Query Workbench Enhancements  By Eben Haber Couchbase Query Workbench Enhancements  By Eben Haber
Couchbase Query Workbench Enhancements By Eben Haber Keshav Murthy
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersKeshav Murthy
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorKeshav Murthy
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0Keshav Murthy
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONKeshav Murthy
 
Tuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesTuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesKeshav Murthy
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
 
Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Keshav Murthy
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLKeshav Murthy
 
Query in Couchbase. N1QL: SQL for JSON
Query in Couchbase.  N1QL: SQL for JSONQuery in Couchbase.  N1QL: SQL for JSON
Query in Couchbase. N1QL: SQL for JSONKeshav Murthy
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications Keshav Murthy
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONKeshav Murthy
 

More from Keshav Murthy (20)

N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0N1QL New Features in couchbase 7.0
N1QL New Features in couchbase 7.0
 
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
 
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
 
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...
 
Couchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing featuresCouchbase 5.5: N1QL and Indexing features
Couchbase 5.5: N1QL and Indexing features
 
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram VemulapalliN1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
N1QL: Query Optimizer Improvements in Couchbase 5.0. By, Sitaram Vemulapalli
 
Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.Couchbase N1QL: Language & Architecture Overview.
Couchbase N1QL: Language & Architecture Overview.
 
Couchbase Query Workbench Enhancements By Eben Haber
Couchbase Query Workbench Enhancements  By Eben Haber Couchbase Query Workbench Enhancements  By Eben Haber
Couchbase Query Workbench Enhancements By Eben Haber
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
Couchbase N1QL: Index Advisor
Couchbase N1QL: Index AdvisorCouchbase N1QL: Index Advisor
Couchbase N1QL: Index Advisor
 
N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0N1QL: What's new in Couchbase 5.0
N1QL: What's new in Couchbase 5.0
 
From SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSONFrom SQL to NoSQL: Structured Querying for JSON
From SQL to NoSQL: Structured Querying for JSON
 
Tuning for Performance: indexes & Queries
Tuning for Performance: indexes & QueriesTuning for Performance: indexes & Queries
Tuning for Performance: indexes & Queries
 
Understanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune QueriesUnderstanding N1QL Optimizer to Tune Queries
Understanding N1QL Optimizer to Tune Queries
 
Utilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and IndexingUtilizing Arrays: Modeling, Querying and Indexing
Utilizing Arrays: Modeling, Querying and Indexing
 
Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5Extended JOIN in Couchbase Server 4.5
Extended JOIN in Couchbase Server 4.5
 
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLBringing SQL to NoSQL: Rich, Declarative Query for NoSQL
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQL
 
Query in Couchbase. N1QL: SQL for JSON
Query in Couchbase.  N1QL: SQL for JSONQuery in Couchbase.  N1QL: SQL for JSON
Query in Couchbase. N1QL: SQL for JSON
 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications 
 
Introducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSONIntroducing N1QL: New SQL Based Query Language for JSON
Introducing N1QL: New SQL Based Query Language for JSON
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Informix warehouse and accelerator overview

  • 1.
  • 2.
  • 3.
  • 4. Data Warehousing Industry Trends
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 14. Typical Data Warehouse Architecture
  • 15. Source: Forrester Query Tools Analytics BPS Apps BI Apps LOB apps Databases Other transactional data sources I/O & data loading Query processing DBMS & Storage mgmt 11.70 Warehousing Features Data Loading HPL DB utilities ON utilities DataStage External Tables Online attach/detach Data & Storage Management Deep Compression Interval and List Fragmentation Online attach/detach Fragment level stats Storage provisioning Table defragmenter Query Processing Light Scans Merge Hierarchical Queries Multi-Index Scan Skip Scan Bitmap Technology Star and Snowflake join optimization Implicit PDQ Access performance
  • 16. Informix Warehouse Tooling - SQW Execution DEPLOY Deployment preparation Deploy RUNTIME HTTP service ( WAS ) SQW Runtime Applications Other Servers (DataStage) DB2 Oracle SQL Server Design Studio Admin Console Deploy Data Source Databases Execution Execution Debug SQW Control DB IDS DESIGN Design Center (Eclipse) Data Flows + Control Flows Deployment package Code Units Build Profile User scripts Warehouse DB IDS SQW Execution DB IDS
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Prior to 11.70: Standard Left Deep Tree Solution Scan D1 1K 1M Problem Join Second Join Build Too Large Scan F Hash Join Hash Join Scan D3 Hash Join Scan D2 100K 1K 10K 1K
  • 33. 11.70 Feature: Pushdown Hash-Join Solution Scan F Scan D1 1K Scan D3 Scan D2 1K Join Keys Multi Index Scan of Fact Table using Join Keys and Single-Column Indexes Join Keys Pushed Down to Reduce Probe Size Hash Join Hash Join Hash Join 1K 1K 1K 1K
  • 35.
  • 36.
  • 37. Example of 2nd Generation Database Disk I/O Issue
  • 38. How Oracle/Exadata Solves That Problem: Add an I/O Layer
  • 39.
  • 40.
  • 41.
  • 42.
  • 43. Breakthrough technologies for performance Row & Columnar Database Row format within IDS for transactional workloads and columnar data access via accelerator for OLAP queries. Extreme Compression Required because RAM is the limiting factor. Massive Parallelism All cores are used within used for queries Predicate evaluation on compressed data Often scans w/o decompression during evaluation Frequency Partitioning Enabler for the effective parallel access of the compressed data for scanning. Horizontal and Vertical Partition Elimination. In Memory Database 3 rd generation database technology avoids I/O. Compression allows huge databases to be completely memory resident Multi-core and Vector Optimized Algorithms Avoiding locking or synchronization 1 2 3 4 5 6 7 1 2 3 4 5 6 7
  • 44.
  • 45. Informix Warehouse Accelerator Overview Coordinator Process Orchestrating the distributed tasks like Load or Query execution . Have all the data in main memory spread across all cores. Do the compression and query execution. IDS Query parsing and matching to the Optimizer. Routing query blocks. . . Worker Processes
  • 46.
  • 47.
  • 48.
  • 49.
  • 50. Case Study #3: U.S. Government Agency             15800.89% Fact Table Scan 0:00:41 1:48:58 Summarize all transactions by State, County, City, State, Zip, Program, Program Year, Commodity and Fiscal Year 5 108.41% Index Read 0:00:06 0:00:06 Detailed Report on Specific Programs in a Date Range 4 41708.49% Fact Table Scan 0:00:14 1:34:37 Summarize all transactions by State and County 3 7640.45% Fact Table Scan 0:01:05 1:22:32 Find Top 100 Members 2 6023.23% Fact Table Scan 0:01:28 1:28:22 Find Top 100 Entities 1 Improvement Notes Informix w/ IWA Informix Description Query
  • 51.
  • 52.
  • 53.
  • 54.
  • 55. Compression Process: Step 1 Male/John Input tuple Column 1 Column 2 Co-code transform Type specific transform Column 1 & 2 Column 3.A Column Code TupleCode Column Code Column 3 Column 3.B Column Code Male/John/Sat Sat 2006 Male, John, 08/10/06, Mango 101101011 001 01011101 101101011 001 01011101 p = 1/512 p = 1/8 p = 1/512 w35/Mango w35 Huffman Encode Dict Huffman Encode Dict Huffman Encode Dict Male John 08/10/06 Mango 1.5% Steven 1.9% Thomas 2.3% Richard 2.4% Mark 2.5% William 3.5% John 3.5% Robert 3.6% James 3.8% David 4.2% Michael 22% 28% 17% 15% 9% 5% 4% Female 12% 42% 23% 6% 10% 4% 3% Male Sun Sat Fri Thu Wed Tue Mon
  • 56. Compression Process: Step 2 First tuple code Tuplecode — Sorted Tuplecodes 1 Previous Tuplecode Delta Huffman Encode Delta Code Append Dict Compression Block 101101011100001100 10110101110001011111 1011010111000011101 10110101110001011101 10110101110001011101 0000000000000000001 000 000 00000000000000000001 010 010 0000000000000000101 1110 1110 Look Ma, no delimiters! 101101011100010111010000101110 — — —
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 63. Distributing data from IDS (Fact tables) Data Fragment Fact Table UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedures Copy A copy of the IDS data is now transferred over to the Worker process. The Worker process holds a subset of the data (compressed) in main memory and is able to execute queries on this subset. The data is evenly distributed (no value based partitioning) across the cpus. Coordinator Process Worker Process Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Compressed Data Worker Process Worker Process Data Fragment Data Fragment Data Fragment
  • 64. Distributing data from IDS (Dimension tables) IDS UNLOAD UNLOAD UNLOAD UNLOAD IDS Stored Procedure All dimension tables are transferred to the worker process. Coordinator Process Worker Process Worker Process Worker Process Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table Dimension Table
  • 65. Mapping Data from IDS to IWA Inside IWA Inside IDS Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table Compressed Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Data Fragment Fact Table Dimension Table Dimension Table Dimension Table Dimension Table
  • 66.
  • 67. IWA Referenced Hardware Configuration Options: 300 GB SAS hard disk drives each 6 disks 512G Memory X7560 @ 2.27GH 4 X 8 Intel(R) Xeon(R) CPU - 16x 1.8" SAS SSDs with eXFlash or 8x 2.5" SAS HDDs - Optional MAX5 32-DIMM memory expansion - Scalable from 4 sockets and 64 DIMMs to 8 sockets and 128 DIMMs - 8-core, 6-core and 4-core processor options with up to 2.26 GHz (8-core), 2.66 GHz (six-core) and 1.86 GHz (four-core) speeds with up to 16 MB L3 cache - 4-processor, 4U rack-optimized enterprise server with Intel® Xeon® processors.
  • 68.
  • 69. (Fred Ho – hof@us.ibm.com)
  • 70.  

Editor's Notes

  1. To understand the benefits of a new technology like the Informix Warehouse Accelerator, it is important to understand what the industry trends are for warehousing, what Informix has to offer in this space and what Informix (in the server) provides for Warehousing. The Informix Warehouse Accelerator will provide clear performance advantages (an order of magnitude gain) over existing technologies in the market place, making real-time analytics possible, reducing reporting windows from many hours to minutes, all without any tuning on a per query basis. This talk will cover the topics listed above.
  2. In the latest issue of the Gartner report for “State of Data Warehousing 2011”, it discussed several key findings, a key one being that cost is driving interest in alternative architectures – notably a strong interest in in-memory data marts. This is significant because over the years, much attention has been paid to EDW (Enterprise Data Warehouse) where all warehouse data are concentrated in a single platform and that data marts were considered too scattered and hard to maintain. This is saying that there is a renewed interest in data marts using new technology and reduced cost. The Informix Warehouse strategy and product offering is very much aligned with the industry trend.
  3. While organizations may have very large data warehouses, upward of multiple Terabytes, Gartner is saying that the amount of data used to gain the necessary insights for most companies, is far less, usually less than 5 TB or less. Our own findings of the Informix customer base dong warehousing/analytics show that the “sweet spot” is around 200GB to 500GB. This is where we’re targeting our products. Traditional methods of optimization as listed work with 2 nd generation technology (as shown in a later slide) but are limited to how to manage data on disk to reduce I/O. We want to go beyond that technology.
  4. To reduce cost and simplify administration, organizations are looking to combining both OLTP and OLAP processing on a single platform. Gartner is saying that in-memory DBMS solutions help make that possible. The Informix Flexible Grid technology allows for such environment today. Running Informix with the Accelerator allows warehouse queries to be offloaded in such a way as to not interfere with shorter OLTP transactions running concurrently on the system.
  5. As a summary, the 4 key technologies that Gartner is pointing out are listed above. All of which again aligns very well with the Informix offering.
  6. Informix has had a long history in Data Warehousing dating back over 15 years. XPS was one of the first Shared-nothing MPP data based designed for data warehousing. In many ways, the ability to partition data and do fast scanning within the partitions within XPS is now being copied by many competitors. Red Brick, originally founded by Ralph Kimball, was a pioneer in data warehousing. All familiar terms such as Star Schema, Star Index, Star Join and dimensional modeling were invented by the Red Brick product. Both products are still actively supported and still have many enterprise customers running on it today. IDS was designed for OLTP. With the last several releases, IDS is now fully functional and capable of handling data warehouse workloads.
  7. The IDS server today contains many features that are suitable for data warehousing, e.g. multi-threading for parallel query processing, efficient Hash Join to handle joining of large Fact table to multiple dimension tables, etc. In addition to query processing, it is important that to manage data in a warehouse environment so something like time-cyclic data management is key.
  8. Our goal is to combine the best features of both XPS and Red Brick into the current Informix Warehouse product. Together with newer technologies such as Flexible Grid and the Warehouse Accelerator, IDS is certainly capable of handling Informix customer needs for warehousing. We also have a full BI software stack including Cognos, SPSS and others that can provide analytics on top of the database, all within the IBM product family.
  9. The features shown here have now all been implemented. Latest release of 11.70 includes Star Join Optimization, Multi-index scan, etc. The upcoming release of 11.70 will contain the Warehouse Accelerator.
  10. In a typical data warehousing environment, you have data coming in from different data sources such as flat files or other data bases. The data goes through an ETL (Extract/Transform/Load) or ELT process and gets loaded into either the Enterprise Warehouse or a staging database and/or an Operational Data Store. If so, further ETL is needed to load data into the enterprise warehouse. Once the data is there, data can be further extracted into data marts for departmental processing. BI software (as shown on the right side of the diagram) can access any of the data stores for reporting, OLAP analysis and Data mining. As of the IDS 11.7 release, any of the data stores can be handled by IDS, and all ETL and BI software from IBM are integrated with Informix.
  11. There are 3 areas to consider when looking at data warehousing support in a RDBMS. 1) Data Loading, 2) DBMS Storage Management, and 3) Query Processing. 11.70 addressed gaps in each of these areas to make IDS a very viable platform for Data Warehousing. The features listed in Red are the features that were added in 11.70. While all the features are well-documented in our manuals and papers, I will highlight just a few in this presentation.
  12. SQW (SQL Warehousing) provides the tooling necessary to do ETL (or ELT in our case) that allows the loading of data into the Informix Warehouse. It also provides for the importing and transforming of OLTP schema into Star schema to make it more amenable for data warehousing. SQW consists of Design Studio where the schema is manipulated using a GUI tool. The resulting ETL “job” can then be deployed on a number of platforms where the data warehouse resides. There is also an Admin Console that controls the scheduling and execution of the job streams created by the SQW tool.
  13. Design studio is an eclipse based tools. (plugins on top of eclipse) IDE means integrated development environment. So other plugins for eclipse can be easily used in design studio (e.g. plugins for cvs, for birt reporting..) we added a perspective called ‘data warehousing’ To work with SQW, the user needs to create Data Warehousing project which is associated with ‘data warehousing’ perspective. A data warehouse project includes data flows, control flows, (physical) data models, deployment package (for admin console deployment) , subflows and subprocess (similar to data flow and control flow, can be embedded in data flow/controlflow), variable definitons (so values can be changed dynamically) Inside data warehousing perspective, there is a view called data source explorer, this is where database connections are created. (to multiple vendors in different remote/local databases) SQW also works with IBM Datastage. You can create Datastage servers inside data stage server view.
  14. An example of physical data model . Visualized modeling tool. -- right click on an object, show impact analysis -- the model can be generated from scratch, or reverse engineered from a database e.g. you can first reverse engineer from database and then refine the model (e.g. add a table, add a FK constraints etc) and then push the changes back to the database -- you can also generate DDLs from the model. -- generate overview diagrams (as shown above) Since design studio shares shell with RDA and other Data Studio products, you can install those products on top of design studio or vice verse. (RDA has logical data modeling/ER diagram etc..)
  15. An example of data flow: On the right hand side is the operator palette, General operators are for data source and target and SQL transformation operators, warehouse operators are for warehousing specific usage (e.g. pivot and unpivot, fact key replacement). There are also Informix specific operators such as Add Fragment.
  16. A simple flow that does file import, table join (join my dimension tables with my sales record in a file to populate my fact table with a group by that calculates the sum of sales) As you can see, the generated SQL is complicated, imagine write these sql statements by hand and there is a typo…. (not many tools help you debug SQL) You can either execute the flow directly or take the generated SQL, refine/modify it and execute it as SQL scripts separately.
  17. Control flows include utility operators (ssh, ftp, command execution, email, file write, file wait) easy to execute database operators (e.g. stored procedure, update statistics), programming logic operators (e.g. iterator, parallel, variable comparison (if/else) , break, fail, continue..) The on-success (green tick), on failure (red x), unconditional (blue error) provides error handling mechanism. The generated code of control flow is not SQL anymore. It is called EPG (execution plan graph) this is our own (SQW) code. It is similar to try/catch/finally logic in Java.
  18. Design studio is the design environment, while admin console provides production environment. Admin console is web based, so the browser does not have to be on the server machine. After the flow designs, one can create an application package ,which is a zip files of deployment profile and generated codes, in design studio. And then this zip files can deployed to admin console, where scheduler can be defined, and execution instances can be monitored.
  19. Our Admin console uses Adobe Flex RIA (rich internet client), state of art technology. Nicer user interface. Dynamically loaded tabs. User can manage common resources, such as database connections, machine resources (used for ssh or ftp etc..) , schedule when to execute a process (control flow) and monitor the execution status (scheduled, running, success, failure etc..)
  20. An important part of managing data in a data warehouse is the ability to add or drop a fragment in an automated fashion. Most retailers have the need to store certain number of intervals of data, usually by month, but it could be by week or perhaps by year. When the new interval of data comes in, the system should automatically add a new segment and releases the oldest one to maintain a total number of intervals desired like 24 or 36. 11.70 now provides this capability.
  21. To provide the time-cyclic data management discussed in the last slide, IDS 11.7 now contains a feature called Interval Fragmentation. As shown in the SQL Create Table command below, the Fragment by Range clause allows one to specify a Fragment key. In this example, it is the “order_date”. The Interval clause then allows one to specify a NUMTOYMINTERVAL (read as number to Year-Month Interval) where the Interval value expression of 1 means that a new interval is automatically created each month. By changing the value to 2, it tells the system to create a new interval every 2 months. Of course, if “MONTH” is changed to “YEAR”, then the interval is by Year. Notice that dbspaces can be specified to spread the intervals across different dbspaces. The rest of the partitions, e.g. p0, specifies values belonging to a certain date range can be kept in a given partition. This is useful for “older” data that needs to be kept but that smaller intervals, e.g. by month, is not necessary.
  22. Now on the query processing side of 11.70, we made some significant improvements in handling Star Schema queries. The first one involves the use of Multi-Index scan.
  23. As an example, we look at the following relatively “simple” query against a single table, the Customer table. The reason this is significant is because this is a very common warehousing type of query that most systems designed for OLTP will not handle well. In particular, we are talking about “low selectivity” queries. Low selectivity in optimizer term means that for a particular predicate (or WHERE condition), it is not very “selective”, implying that many rows will qualify. A typical example is the condition of Gender = “male”. For a Customer table, this could on average, return half the rows in the table. And if this is a phone company, for instance, this could return millions of rows. Even the rest of the predicates, e.g. income_category = “HIGH”, are not selective. Let’s see how a system would process such a query.
  24. Prior to 11.70, IDS would evaluate this query by retrieving the rows based on the most selective constraint, followed by a sequential evaluation of each of the other constraints. As one can imagine, this results in retrieving a large number of rows. e.g. if education level is deemed most selective, then all rows that match the education_level constraint are retrieved, followed by a successive row-by-row constraint evaluation, i.e. of gender, income_category and zip_code. A 10 million row Customer table that has 4 unique education levels would still return 2.5 million rows. What’s worse is that even if there are indexes on the other columns, they will not be used.
  25. With 11.70, we can now take advantage of multi-index scan. One can now set up a different index for each column. The system then uses each index to retrieve a list of rows that qualify and represent them in a bitmap. Then using the AND operator to merge the bitmaps to produce a qualifying set of bits representing the row-ids for the qualifying rows. We further optimize the retrieval by sorting the bits before going in to retrieve the rows in a sequential manner. Also very easy to provide count(*) often found in queries by just counting the bits.
  26. The previous example shows how 11.70 can speed up queries using multiple indexes on a single table. But the main performance advantage of warehouse queries involves the joining of data between Fact table and dimension tables. In this example, we see that a typical way to join a Fact table (usually large) with a dimension table (usually much smaller) involves what’s known as a Hash Join. (Other common join methods are Nested-Loop Join or Sort-Merge Join) A hash join builds a hash table with the values in the smaller table, then uses it to probe values on the larger table, providing a predictable lookup time. Hash joins can overflow to disk if there is insufficient memory.
  27. This example will show the how the new Push-Down Hash Join feature in 11.70 can significantly speed query response for typical warehouse queries involving joins to the Fact Table. Assume that we have a Fact Table (F) with 1 million rows. In this example, we will use a Selectivity of 1/1000. This means that on average, a query to this fact table will select about 1000 rows. Further assume that there are 3 dimension tables (D1-D3) each with 10,000 rows and a default selectivity of 1/10, which means that each will return about 1,000 rows as well.
  28. Prior to 11.70, a query involving the join of the Fact table with all 3 dimension tables would look like the following: Hash join between D1 and F, where the whole Fact table is scanned and joined with D1. Note that the intermediate result after the Hash Join contains 100K rows. This result is then joined with D2 where 1K rows are scanned from D2 and joined, resulting in 10K rows as the intermediate result. The final hash join is then performed with D3, returning the result. While this Left Deep tree method is successful, a lot of data is scanned and intermediate result is large, resulting in high memory consumption and long elapsed time.
  29. In 11.70, we take advantage of the Multi-index scan (as discussed previously) to reduce the number of rows being scanned from each of the dimension tables. By doing so, we “push down” the join keys to the Fact table and reduces the rows scanned from the Fact table to 1K (instead of 1M in the previous slide), thus reducing the number of rows retrieved from the Fact table and reducing the number of rows in each intermediate join to 1K rows. This is called the “Right Deep Tree” method and provides significant savings in memory consumption and response time.
  30. Now we come to IWA. We will discuss the following topics to introduce the new offering IWA that will be available as the Informix Ultimate Warehouse Edition (IUWE).
  31. According to the IDC article (and subsequent articles from Gartner and Forrester), we are now entering the 3 rd generation of Database Technology. The 1 st generation systems like IMS had very rigid procedural (or navigational) requirements to retrieve records from the database, thus making applications not portable to other systems. (i.e. program written for IMS cannot be used for IDMS) The 2 nd generation systems use SQL extensively in Open Systems environments and are non-procedural, i.e. making use of Optimizer to determine an optimal access path. However, these systems’ performance is limited to disk layout of the data and how efficiently we can do I/O. Over the years, we have added many techniques to “optimize” I/O by using indexes, partitioning, summary tables, multi-dimensional cubes, query rewrites, etc. All requiring much expertise developed over the years to “tune” the system. Yet, there are still many instances of “runaway queries” and IT staff is still worried about “load windows” or “reporting windows” that have to be met. The paper argues that a 3 rd generation database technology is here that uses mostly in-memory, columnar technology to eliminate I/O, not reduce it. And with new advances in commodity hardware, scaling can be achieved with clustering. Furthermore, OLTP and Data Warehousing can co-exist on the same system in an economic way.
  32. As further example of the 2 nd generation database technology, we look at Oracle’s Exadata as an example. It’s important to understand why Oracle has introduced Exadata in the first place. It was to address performance issues with Oracle data warehouse implementations. In Larry Ellison's keynote at OracleWorld in 2008, Ellison stated that the biggest problem with Oracle's data warehouse today is that the pipe between the storage and the database server is too small . This is something people have said about Oracle's data warehouse implementations for a long time. The design of Oracle RAC is that you can have many servers but they are all connected to a single shared copy of the data. If you need more performance your only option is to add more servers. But this does not increase the storage throughput because you are connecting additional servers to the same storage network so you end up with an I/O bottleneck. So Oracle introduced Exadata to alleviate Oracle’s I/O problems, at a cost of an additional hardware layer and additional software charges per disk drive ($10,000).
  33. Exadata adds another layer to the system architecture, which in turn adds complexity. Exadata acts as a storage layer that will perform I/O on behalf of the Oracle database, which can be a single instance database or a RAC database. Exadata interacts with the Oracle database over the Infiniband network. One sees that the I/O problem has not been solved, simply shifted to another layer of hardware.
  34. Here is a diagram of the full rack Sun Oracle Database Machine configuration. As previously noted, there are 8 Oracle RAC database servers connected to 14 Exadata storage cells via an InfiniBand network. Each Exadata cell is capable of up to 1.5 GB per second peak I/O bandwidth (using 600GB SAS drives), for an aggregate I/O bandwidth of up to 21 GB per second for the full rack Sun Oracle Database Machine. The half rack system would look the same but there would be only 4 Oracle RAC database servers and 7 Exadata storage cells. Note that really all Oracle has done is introduce another layer of servers that have the disks directly attached, and these servers perform the I/O. By offloading the I/O to these servers, the Oracle RAC system should perform better.
  35. This slide breaks down the Oracle Database Machine pricing for the full rack machine. The hardware has a list price of $1,150,000. The 600GB drive “high performance” configuration and the 2TB drive “high capacity” configuration are the same price. The Oracle 11g database software, including RAC and the partitioning option is $2.624 million, and the software licenses for the Oracle Exadata Storage Server software is another $1.68 million. Also recommended are the Oracle packaging options for compression, and the diagnostic and tuning packs. Software support will be an additional 22% of the software license costs. Oracle software options for Data Mining, OLAP, and ETL cost extra. So does hardware installation. While Oracle also sells its Exadata solution in ½ rack or ¼ rack, we are still talking about a solution in the millions of dollars.
  36. IWA is: An integration of IBM hardware, software, and storage and advanced technologies focused on business analytics, that combine to give IBM clients the industry's most high performance analytic capability, to extract business insight from information assets, providing the right answers to the right questions at the right time. It marries the best of row and columnar store technologies with a highly compressed data, in-memory, massively parallel architecture. It removes the costs of traditional performance tuning such as indices, materialized query tables (MQTs) and query plan tuning that required extensive time and expensive resources. It is optimized for full table scans (the most common BI query access pattern), that enable customers to get quick and predictable results to gain business insights. It enables queries that have been removed from the system in the past, due to their long running, high resource requirements.
  37. Listed are the key technologies involved in providing the major speedup in query response which will be discussed further later in the presentation.
  38. Query flow of IDS and IWA is the following: IDS users continue to submit queries either directly or via a BI tool like Microstrategy, Cognos, etc. IDS optimizer decides which query (or query block within a query) can be routed to the Accelerator Accelerator returns the answers to IDS which then gets returned to the user Note that the IDS database (or warehouse) stays the same and is still kept on the IDS side. There is no database on the IWA side except what is loaded in memory. Local disk on IWA only serves to provide a memory image in case of failure.
  39. Internally, IWA spawns a Coordinator process that manages the loading the execution of the queries. It also assembles result set from the various Worker processes which does the bulk of the work. Parameters are available to specify the amount of memory used by the coordinator and worker processes. Parallelism is still achieved with minimal worker processes.
  40. The sweet spot for IWA is data warehousing queries, in the form of Star Schema or Snowflake Schema. Via the Smart Analytics Studio, the user specifies via a GUI environment, what tables compose of the “mart“. IWA will automatically offload data belonging to the mart.
  41. This is a typical query against the Star Schema you see on the previous page. It usually involves a Fact table and joining with a number of dimension tables followed by aggregation and sorting.
  42. One of our EVP customers, a major shoe retailer in the U.S., did testing comparing the use of IDS 11.5 vs. IWA. The database contains a billion row Fact table and a number of smaller dimension tables. It took about 30 minutes to load the data. As shown, each of the most problematic queries returned in 2 to 4 secs. (Machine was a x3850 with 24 cores (4x6) and 256GB RAM)
  43. Another EVP customer is a large government agency in Europe. Queries were submitted via Microstrategy. This is their most complex report involving hundreds of SQL statements. Even on the Intel box, it went from 40 minutes to a little over 1 minute.
  44. 3 rd EVP customer involve a U.S. Government agency. Again with impressive speedup.
  45. In traditional (2 nd generation) DBMS, we use a Row – Store approach where each row is stored completely and where multiple rows are stored sequentially in I/O optimized data structures. If only few columns are required (in the Project list), the complete row needs to be fetched and uncompressed. Most of the data is moved and decompressed w/o even being used.
  46. Query Engines, which are optimized for analytical queries, tend to use a Column – Store approach. In a Column – Store , the data of a specific column is stored sequentially before the data of the next column begins. If attributes are not required for a specific query execution, they simply can be skipped completely, not causing any I/O or decompression efforts. In a Column – Store , the data is also compressed sequentially for a column. This is an optimized approach if you plan to perform a sequential scan over your data. Random access to specific attributes in this store does not perform well. Many analysts are touting the use of columnar store databases for analytic processing as the only logical approach to use.. Vendors such as Vertica, Sybase IQ, ParAccel, and SAP‘s BIA use columnar store approaches already. But this is not a new idea data managament products such ad ADABAS and Model 204 used it in the past. Note that while IWA uses columnar technology, it does not store the columnar data on disk.
  47. IWA uses a technique called Frequency Partitioning. In the diagram above, once sees that the table Trade Info contains columns Volume, Product, Orgin Country. Histograms are built for each column to determine frequency of data value occurrences, as shown with Origin and Product. Then the system looks for the most frequently occuring values in each of the columns, in the example, Top 64 Traded Goods. It then encodes those values with the least number of bits that can adequately represent the data (Approximate Huffman encoding) Idea being that most accessed values will require the least number of bits to be manipulated. Now, these values are then intersected with values in other columns, Top Traded Goods from China/USA. These encoded values are then placed in memory cells across all available memory in the system used for subsequest scan operations. The next slide shows an example of this and further encoding used for IWA.
  48. This example shows the compression steps used in IWA. Starting with the record containing Name, Sex, Date of Purchase and Product, histograms are built for each column. Relationship between columns are then evaluated causing columns of values with “male” and “John” to be combined. The purchase date is broken up into smaller columns which are then further combined with other columns as shown with week 35 “w35” and “Mango”. Finally, depending on probability of value occurrences such as p=1/512 for “male/john/sat”, those values are encoded into the bits “101101011” and so on. Notice that the end encoded value is a single series of bits representing the entire row.
  49. Following up from the previous slide, there is further compression using Delta encoding which compares values of the current row with the previous row. In this case, one sees that a 2 nd row can be fully represented by adding a few bits to the previous row. There is then a 2 nd level compression by placing the encoded value back into a dictionary, so that other rows with similar values can simply be represented by a few bits. The end result of an entire set of rows can now be represented in one string of bits you see at the bottom of the page. This is an animated slide.
  50. The Single Instruction Multiple Data (SIMD) Parallelism is used with IWA to further exploit parallelism with Columnar technology. In this example, notice that the query only touches columns A, D and G from table T. And given that IWA groups columns into “banks”, it is able to use “vertical partitioning” and load the register with multiple rows of columns A, D, and G into the 128 bit register, thus processing multiple rows in a single instruction. This is an animated slide.
  51. This diagram shows how IWA can do simultaneous evaluation of equality predicates. With the encoded bits shown in previous slides, one sees how multiple predicates can be evaluated all at once. Note that we’re evaluating compressed/encoded values without the need for uncompression and re-compression of data.
  52. This slide describes the process of how a customer would go about defining what data needs to be accelerated. Using a GUI tool called the Smart Analytics Studio, the user defines a data mart by clicking on the Fact Table and surrounding Dimension Tables that composes of the mart. IDS will then use that info to transfer that user data into a highly compressed, scan optimized format for all subsequent queries that qualify. Internally, IWA uses a number of nodes (aka. Processes) to build the mart. The number of Worker processes define the number of threads used to build the optimized format. Refer to the IWA Admin Guide for more info on parameters for the configuration.
  53. This is a picture of the IWA Design Studio showing the Rich Client Interface and how tables and dimensions can be manipulated.
  54. Animated slide to show movement of data from IDS to IWA via Coordinator and Worker processes. Note that there is no requirement on how the table on IDS needs to be partitioned on IDS. The Fact table is split into multiple parts and distributed evenly across the Worker nodes within the cluster. Bigger Fact tables „just“ require enough Worker nodes to contain the compressed data in memory.
  55. The Join Strategy between Dimension Tables and the Fact table data is always a colocated join. This means that all dimension tables are fully replicated to each of the worker nodes. Space requirements for dimension tables therefore needs to be multiplied with cluster size (amount of Worker Nodes)
  56. Animated slide to show movement of data from IDS to IWA. For a single SMP environment in this case, it is simply moving all the data over to IWA. In future releases where we support multiple nodes, as in a Mach11 environment, dimension tables are replicated to each node.
  57. This is the referenced hardware configuration for IWA. The ex5 system can be x3850 or x7560 as shown with a recommended RAM of 512 GB. Ex5 is capable of 1.5 TB of RAM. In mid-2011, the maximum goes up to 3 TB and soon much more. The internal disk is used only to back up the memory image of IWA, not to store the data warehouse itself. Amount of RAM needed is completely dependent on amount of raw data to be fit into memory, using a 3:1 compression ratio to determine what’s needed.