SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
Data Warehouse Systems in the
Cloud: new requirements and new
challenges
Rim Moussa

LaTICE Lab. -University of Tunis
ESTI -University of Carthage
rim.moussa@esti.rnu.tn
10th Intl. Conference on Computer Systems and Applications
(AICCSA), Fez, Kingdom of Morocco
th
30 May 2013 Keynote @ Intl. Conference on Computing, Networking and
30th May
Communications, Hammamet, Tunisia
DWS in the Cloud, AICCSA'13, Fez
2013
Context
Cloud Rationale

Benchmarking
Data Warehouse
Systems
NO

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

2
Cloud Rationale

Benchmarking
Data Warehouse
Systems
NO

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

3
Outline
1. Cloud Computing
2. Data Warehouse Systems
3. Overview of DWS Benchmarks
4. New Requirements for DWS in the Cloud
5. Related Work
6. Conclusion
7. Research Perspectives

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

4
Cloud Computing

●

NIST Definition
–

●

cloud computing as a pay-per-use model for enabling available,
convenient, on-demand network access to a shared pool of
configurable computing resources (e.g. networks, servers, storage,
applications, services) that can be rapidly provisioned and released
with minimal management effort or service provider interaction.

Opportunities
–

Performance

–

Faster data analysis through usage of up-to-date hardware
infrastructure made available by Cloud Service Providers,
More Economical
●

●

30th May
2013

Organizations no longer need to expend capital upfront for
hardware and software purchases, with Services provided on a
pay-per-use basis,
DWS in the Cloud, AICCSA'13, Fez

5
Cloud Computing
--Market share
●

Market Share
–

Forrester Research expects the global cloud computing
market to reach $241 billion in 2020,

–

Gartner group: The public cloud services market is
forecast to grow 18.5% in 2013 to total $131 billion
worldwide, up from $111 billion in 2012,

–

Gartner: the public cloud services market in the Middle
East and North Africa (MENA) is expected to increase
by 24.5% in 2013,

–

Gartner group: the public cloud services market in INDIA
is forecast to grow 36% in 2013 to total $443 million, up
from $326 million in 2012,

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

6
Data Warehouse Systems
--Typical System Architecture

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

7
Data Warehouse Systems
--Technologies
●

Traditional Relational DBMSs & OLAP Servers
–
–

●

Mature
Do not scale linearly

NoSQL solutions
–

Adopted by Google, Facebook, Amazon, ...

–

Dynamic horizontal scale-up

–

Nodes are added without bringing the cluster down
●
Shared-nothing architecture
●
Independent
computing
and
storage
nodes
interconnected via a high speed network
MapReduce Distributed programming framework
●

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

8
Data Warehouse Systems
--challenges with big data management

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

9
Data Warehouse Systems
--Common Optimizations: Hardware Storage Tech.
●

DRAM: in-memory data processing (very expensive)

●

SSD (Solid State Drives): a non-volatile type of memory.
●

An SSD does not have a mechanical arm to read and
write data
SSD

HDD

Cost/GB

$1/GB

$0.075/GB

Typical size

512GB

Up to 2TB

Failure rate:
2 million hours
MTBF
Read/Write speed 200-500 MBps

30th May
2013

1.5 million hour
120 MBps

DWS in the Cloud, AICCSA'13, Fez

10
Data Warehouse Systems
--Common Optimizations: Columnar Storage Principle
●

Row-oriented storage
–

Read pages containing all columns
Date

●

Customer

Product

Price Quantity

Column-oriented storage
–

Read only columns needed for query processing

Date

30th May
2013

Customer

Product

Price

DWS in the Cloud, AICCSA'13, Fez

Quantity

11
Data Warehouse Systems
--Common Optimizations: Columnar Storage Benefits
●

●

●

Allows best data compression rate, since data values are
redundant within a single column,
Eliminates unnecessary I/O through the retrieval of only
relevant data
Vectorwise is in the TPC-H - Top Ten Performance Results
(14-Jun-2013)

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

12
Data Warehouse Systems
--Common Optimizations: Derived Data
●

Derived Data:
–
–

Derived Attributes,

–
●

Indexes,
Aggregate tables

Pros:
–

●

High Performance

Cons:
–

Maintenance: refresh is expensive

–

Storage cost

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

13
Data Warehouse Systems
--DWS Benchmarks
●

APB-1 OLAP Benchmark --obsolete
–
–

●

Released by the OLAP Council (www.olapcouncil.org) in 1998
A simple star schema data model

TPC DSS Benchmark
–

Released by the Transaction Processing Council (www.tpc.org)

–

Examine large volumes of data (from 10GB to 100TB)

–

Complex relational data model

–

TPC-H
Workload composed of 22 ad-hoc complex SQL Statements
●
The most prominent DSS benchmark
TPC-DS -successor of TPC-H
●

–

●
●

30th May
2013

Workload composed of a 99 SQL business questions
Same metrics than TPC-H
DWS in the Cloud, AICCSA'13, Fez

14
Data Warehouse Systems
--TPC-H Benchmark Metrics (same for TPC-DS)
●

Query-per-hour Performance Metric
–
–

●

For a given scale factor (warehouse data volume)
Concurrent users

Price-Performance Metric
–

30th May
2013

Ratio of Priced System (cost of ownership: hardware,
software, maintenance, and cost of everything needed to run
the TPC6H workload) to Query performance Metric

DWS in the Cloud, AICCSA'13, Fez

15
Data Warehouse Systems
--TPC-H mismatches Cloud Rationale
●

TPC-H Does not represent BI suites
–
–

Analytics services (Multi-dimensional
Language, Mining Structures)

–
●

Integration services

Reporting services

eXpressions

TPC-H Workload Processing Metric
–

Qph@Size defines the number of queries processed by hour

–

The workload is assumed static, which is not realistic!

–

The benchmark should assess the SUT scalability under
variable and evolving workload and data volumes

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

16
Data Warehouse Systems
--TPC-H mismatches Cloud Rationale (ctnd.1)
●

TPC-H Cost-Performance Metric
–

$/Qph@Size, where the cost relates to all of hardware,
software and HR required for running the workload (3yrs)

–

The cost model in the cloud is different, and does
relate to the cost of ownership

●

TPC-H does not report a Cost-Effectiveness Metric

●

not

TPC-H implementation vs. CAP theorem
–

CAP theorem: A distributed system can not fulfill both
Consistency (same view of data), Availability (query response)
and Partition Tolerance (cope with hardware crash).

–

Since DWS deployments are onto shared-nothing architectures,
benchmarks should be either CA, CP and AP-compliant.

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

17
New Requirements & New Metrics
NewRequirements & New Metrics

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

18
High Performance Requirement

High Performance Requirement
--Data Transfer IN/ OUT CSP
●

Data Transfer Characteristics
–

Huge data volumes transfer IN and OUT the
Cloud Service Provider

–

Resulting in Network-bound DWS

–

Usually, the cost model adopted by CSPs is:
●
●

●

Data upload IN the CSP is free of charge
Data download OUT the CSP is priced

Data Transfer Metrics in the Cloud
–
–

30th May
2013

Time and cost for data upload
Time and cost for data download

DWS in the Cloud, AICCSA'13, Fez

19
High Performance Requirement

High Performance (ctnd. 1)
Requirement
--Workload Processing
●

Workload Processing Characteristics
–

–

●

Both I/O-bound and CPU-bound business
questions
Intra-query processing combined with virtual
partitioning or physical processing

Performance across Cluster Size
–

–

30th May
2013

For each business question, there is an
optimum response time for a particular cluster
size and performance degrades from this
optimum onward and backward
Proved for both SQL and NoSQL technologies

DWS in the Cloud, AICCSA'13, Fez

20
High Performance Requirement

High Performance (ctnd.2)
Requirement
--Workload Processing
●

30th May
2013

TPC-H benchmarking of Apache Hadoop/Pig
Latin
on
GRID5000
-Bordeaux
Site
[Moussa,ICCIT'12] (SF=10)

DWS in the Cloud, AICCSA'13, Fez

21
High Performance Requirement

High Performance (ctnd.3)
Requirement
--Workload Processing
●

Workload Processing Metrics
–
–

30th May
2013

Elapsed times for running business questions,
Slope: performance - cost

DWS in the Cloud, AICCSA'13, Fez

22
Scalability Requirement

●

Definition
–

●

Scalability is the ability of a system to
increase total throughput under an
increased load when hardware resources
are added..

Scalability Metric
–

Query Performance Metric under
●
●

30th May
2013

Ever increasing workload
Different query frequencies

DWS in the Cloud, AICCSA'13, Fez

23
Elasticity Requirement

●

Definition
–

●

Elasticity adjusts the system capacity at runtime by
adding and removing resources without service
interruption in order to handle the workload variation.

Elasticity Metric
–
–

Scaling Latency: elapsed time to scale-down and
scale-up

–

Impact on SUT performances during scale-up and
scale-down

–

Scale-up cost (+$)

–

30th May
2013

Capacity to add/remove resources: (0|1)

Scale-down gain (-$)

DWS in the Cloud, AICCSA'13, Fez

24
High Availability Requirement
–- Redundancy Strategies
●

Redundancy Strategies
–
–

●

Replication (a.k.a. mirroring)
Erasure-Resilient Codes

Redundancy Strategies vs. Workload Type
–
–

●

Replication suits OLTP workload
Erasure-resilient codes suits OLAP workload

Comparison [Litwin et al.,ACM TODS'05]
–
–

Computation cost

–

30th May
2013

Data storage cost
Communication cost

DWS in the Cloud, AICCSA'13, Fez

25
High Availability Requirement
–-Strategies Comparison (ctnd.1)

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

26
High Availability Requirement
--Metrics for the Cloud (ctnd.2)

●

High Availability Metrics
–

$@k: Cost of different targeted levels of
availabilities (1-available, . . . , k-available, i.e.
the number of failures the system can tolerate).

–

Cost of recovery expressed
●
●

30th May
2013

Time to get system back
Decreased system productivity caused by
the hardware failure ($) from customer
perspective

DWS in the Cloud, AICCSA'13, Fez

27
Cost Management Requirement

●

CSP price cost model
–

Different cloud service price models (IaaS,
PaaS, SaaS)

–

e.g.

CPU cost for IaaS: Instance based
(Amazon, MS Azur) or CPU-cycles based
(Cloud Sites, Google App Engine)
●
Query processing by Google BigQuery is
based on retrieved bytes (columnar storage)
Cost-Performance Ratio
●

●

●

30th May
2013

Cost-Effectiveness ratio

DWS in the Cloud, AICCSA'13, Fez

28
Related Work
●

Benchmarking in the cloud
–

[Gray,MS'08]: Terasoft Benchmark for data sort evaluations,

–

[Cooper et al., SoCC'10]: Yahoo Cloud Serving Benchmark
(YCSB) for evaluating the performance of "key-value" and
"cloud" serving stores.

–

[Sobel et al., ICCSA'08]: CloudStone Benchmark for Web2.0
applications

–

[Bennet et al., KDD'10]: MalStone Benchmarking for data
mining in the cloud

–

[Ang et al., USENIX'10]: CloudCMP project for CSP
comparison

–

[Binnig et al., DBTest'09], [Kossmann et al., SIGMOD'10]:
Benchmarking OLTP systems in the cloud

●

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

29
Related Work (ctnd.1)
●

NoSQL and SQL Technologies Assessment in the cloud
–
–

●

[Pavlo et al. SIGMOD'09],
[Floratou et al., TPC-TC'11 ],

More Specific Issues
–

[Forrester, 2011]: Storage on-premises vs. in the cloud

–

[Nguyen et al., EDBT Workshops'12]: Materialized Views
Selection

–

[Moussa, IJWA'12]: OLAP Scenarios in the Cloud and OLAP
Workload Texonomy

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

30
Conclusion & Future Work
●

Keynote scope
–

Overview of DWS

–

Insight of new requirements and new metrics to be
considered for benchmarking DWS in the cloud [Moussa,
AICCSA'13]

●

Research Perspectives
–

Assessment of OLAP systems in the cloud e
●
●
●
●

30th May
2013

Amazon RDS
Google BigQuery
MS Azure
...
DWS in the Cloud, AICCSA'13, Fez

31
Research Perspectives
--New OLTP Systems
●

Classical Workload Taxonomy
–
–

●

OLTP: Transactions, ACID properties
OLAP: complex queries, star-joins, grouping,
aggregations...

New OLTP Workload features:
–
–

Big Data

–
●

OLTP
Real-time analytics

Examples of systems: Google Spanner,
Clustrix, NuoDB and TransLattice

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

34
Thank you for Your Attention
Q&A

?
Rim Moussa
Data Warehouse Systems in the Cloud
N2C'2013, Hammamet
30th May
2013

15th June 2013

DWS in the Cloud, AICCSA'13, Fez

35
Data Warehouse Systems
--TPC-H Benchmark Relational DB Schema

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

36
Data Warehouse Systems
--TPC-H Benchmark Metrics

30th May
2013

DWS in the Cloud, AICCSA'13, Fez

37

Contenu connexe

Tendances

Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence GeneratorRim Moussa
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 TutorialRim Moussa
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsChristophe Debruyne
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil AmbagadeSigmoid
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia Bharat Kalia
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...Dataconomy Media
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)Ryan Blue
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...Dataconomy Media
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Jelena Zanko
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow CacheAlluxio, Inc.
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceHBaseCon
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...DataStax
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop projectKamal A
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020GEO Analytics Canada
 

Tendances (20)

Parallel Sequence Generator
Parallel Sequence GeneratorParallel Sequence Generator
Parallel Sequence Generator
 
ER 2016 Tutorial
ER 2016 TutorialER 2016 Tutorial
ER 2016 Tutorial
 
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure DefinitionsGenerating Executable Mappings from RDF Data Cube Data Structure Definitions
Generating Executable Mappings from RDF Data Cube Data Structure Definitions
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Time series database by Harshil Ambagade
Time series database by Harshil AmbagadeTime series database by Harshil Ambagade
Time series database by Harshil Ambagade
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ..."Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
 
Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20Imply at Apache Druid Meetup in London 1-15-20
Imply at Apache Druid Meetup in London 1-15-20
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Argus Production Monitoring at Salesforce
Argus Production Monitoring at SalesforceArgus Production Monitoring at Salesforce
Argus Production Monitoring at Salesforce
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
SASI: Cassandra on the Full Text Search Ride (DuyHai DOAN, DataStax) | C* Sum...
 
Payment Gateway Live hadoop project
Payment Gateway Live hadoop projectPayment Gateway Live hadoop project
Payment Gateway Live hadoop project
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020Geo Analytics Canada Overview - May 2020
Geo Analytics Canada Overview - May 2020
 

En vedette

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseSnowflake Computing
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesCaserta
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話Preferred Networks
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best PracticesEduardo Castro
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehousedrluckyspin
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 

En vedette (10)

Introducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data WarehouseIntroducing the Snowflake Computing Cloud Data Warehouse
Introducing the Snowflake Computing Cloud Data Warehouse
 
Build a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 MinutesBuild a Big Data Warehouse on the Cloud in 30 Minutes
Build a Big Data Warehouse on the Cloud in 30 Minutes
 
Open Source Datawarehouse
Open Source DatawarehouseOpen Source Datawarehouse
Open Source Datawarehouse
 
ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話ツイートID生成とツイッターリアルタイム検索システムの話
ツイートID生成とツイッターリアルタイム検索システムの話
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Data Warehouse Best Practices
Data Warehouse Best PracticesData Warehouse Best Practices
Data Warehouse Best Practices
 
Cloud Computing and your Data Warehouse
Cloud Computing and your Data WarehouseCloud Computing and your Data Warehouse
Cloud Computing and your Data Warehouse
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 

Similaire à Benchmarking data warehouse systems in the cloud: new requirements & new metrics

CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptdhanasekarscse
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalScyllaDB
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkHentsū
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and VirtualizationPeter Tröger
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree AnikeyRoy
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtreesamirandev1
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtreedevraajsingh
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog sameerroshan
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase Türkiye
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationDATAVERSITY
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Sumeet Singh
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successDataWorks Summit
 
Cloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareCloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareFotis Stamatelopoulos
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...IOSR Journals
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...Angela Williams
 

Similaire à Benchmarking data warehouse systems in the cloud: new requirements & new metrics (20)

CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.pptCENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
CENTRE FOR DATA CENTER WITH DIAGRAMS.ppt
 
CDP_2(1).pptx
CDP_2(1).pptxCDP_2(1).pptx
CDP_2(1).pptx
 
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data ArchivalGPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
GPS Insight on Using Presto with Scylla for Data Analytics and Data Archival
 
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New YorkInfinitely Scalable Clusters - Grid Computing on Public Cloud - New York
Infinitely Scalable Clusters - Grid Computing on Public Cloud - New York
 
Cloud Standards and Virtualization
Cloud Standards and VirtualizationCloud Standards and Virtualization
Cloud Standards and Virtualization
 
Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree									Steps to Modernize Your Data Ecosystem | Mindtree
Steps to Modernize Your Data Ecosystem | Mindtree
 
Six Steps to Modernize Your Data Ecosystem - Mindtree
Six Steps to Modernize Your Data Ecosystem  - MindtreeSix Steps to Modernize Your Data Ecosystem  - Mindtree
Six Steps to Modernize Your Data Ecosystem - Mindtree
 
6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree6 Steps to Modernize Data Ecosystem with Mindtree
6 Steps to Modernize Data Ecosystem with Mindtree
 
Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog Steps to Modernize Your Data Ecosystem with Mindtree Blog
Steps to Modernize Your Data Ecosystem with Mindtree Blog
 
Sybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem PerformansSybase IQ ile Muhteşem Performans
Sybase IQ ile Muhteşem Performans
 
Slides: Relational to NoSQL Migration
Slides: Relational to NoSQL MigrationSlides: Relational to NoSQL Migration
Slides: Relational to NoSQL Migration
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Cloud-based Energy Efficient Software
Cloud-based Energy Efficient SoftwareCloud-based Energy Efficient Software
Cloud-based Energy Efficient Software
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
D017212027
D017212027D017212027
D017212027
 
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
A Novel Approach for Workload Optimization and Improving Security in Cloud Co...
 
Cloud & Data Center Networking
Cloud & Data Center NetworkingCloud & Data Center Networking
Cloud & Data Center Networking
 
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTINGEFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
EFFICIENT TRUSTED CLOUD STORAGE USING PARALLEL CLOUD COMPUTING
 
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
IEEE Paper - A Study Of Cloud Computing Environments For High Performance App...
 

Plus de Rim Moussa

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfRim Moussa
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data ProjectsRim Moussa
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Rim Moussa
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)Rim Moussa
 

Plus de Rim Moussa (6)

polystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdfpolystore_NYC_inrae_sysinfo2021-1.pdf
polystore_NYC_inrae_sysinfo2021-1.pdf
 
Big Data Projects
Big Data ProjectsBig Data Projects
Big Data Projects
 
EMR AWS Demo
EMR AWS DemoEMR AWS Demo
EMR AWS Demo
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)Automation of MultiDimensional DB Design (poster)
Automation of MultiDimensional DB Design (poster)
 
highly available distributed databases (poster)
highly available distributed databases (poster)highly available distributed databases (poster)
highly available distributed databases (poster)
 

Dernier

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Dernier (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Benchmarking data warehouse systems in the cloud: new requirements & new metrics

  • 1. Data Warehouse Systems in the Cloud: new requirements and new challenges Rim Moussa LaTICE Lab. -University of Tunis ESTI -University of Carthage rim.moussa@esti.rnu.tn 10th Intl. Conference on Computer Systems and Applications (AICCSA), Fez, Kingdom of Morocco th 30 May 2013 Keynote @ Intl. Conference on Computing, Networking and 30th May Communications, Hammamet, Tunisia DWS in the Cloud, AICCSA'13, Fez 2013
  • 2. Context Cloud Rationale Benchmarking Data Warehouse Systems NO 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 2
  • 3. Cloud Rationale Benchmarking Data Warehouse Systems NO 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 3
  • 4. Outline 1. Cloud Computing 2. Data Warehouse Systems 3. Overview of DWS Benchmarks 4. New Requirements for DWS in the Cloud 5. Related Work 6. Conclusion 7. Research Perspectives 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 4
  • 5. Cloud Computing ● NIST Definition – ● cloud computing as a pay-per-use model for enabling available, convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications, services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. Opportunities – Performance – Faster data analysis through usage of up-to-date hardware infrastructure made available by Cloud Service Providers, More Economical ● ● 30th May 2013 Organizations no longer need to expend capital upfront for hardware and software purchases, with Services provided on a pay-per-use basis, DWS in the Cloud, AICCSA'13, Fez 5
  • 6. Cloud Computing --Market share ● Market Share – Forrester Research expects the global cloud computing market to reach $241 billion in 2020, – Gartner group: The public cloud services market is forecast to grow 18.5% in 2013 to total $131 billion worldwide, up from $111 billion in 2012, – Gartner: the public cloud services market in the Middle East and North Africa (MENA) is expected to increase by 24.5% in 2013, – Gartner group: the public cloud services market in INDIA is forecast to grow 36% in 2013 to total $443 million, up from $326 million in 2012, 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 6
  • 7. Data Warehouse Systems --Typical System Architecture 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 7
  • 8. Data Warehouse Systems --Technologies ● Traditional Relational DBMSs & OLAP Servers – – ● Mature Do not scale linearly NoSQL solutions – Adopted by Google, Facebook, Amazon, ... – Dynamic horizontal scale-up – Nodes are added without bringing the cluster down ● Shared-nothing architecture ● Independent computing and storage nodes interconnected via a high speed network MapReduce Distributed programming framework ● 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 8
  • 9. Data Warehouse Systems --challenges with big data management 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 9
  • 10. Data Warehouse Systems --Common Optimizations: Hardware Storage Tech. ● DRAM: in-memory data processing (very expensive) ● SSD (Solid State Drives): a non-volatile type of memory. ● An SSD does not have a mechanical arm to read and write data SSD HDD Cost/GB $1/GB $0.075/GB Typical size 512GB Up to 2TB Failure rate: 2 million hours MTBF Read/Write speed 200-500 MBps 30th May 2013 1.5 million hour 120 MBps DWS in the Cloud, AICCSA'13, Fez 10
  • 11. Data Warehouse Systems --Common Optimizations: Columnar Storage Principle ● Row-oriented storage – Read pages containing all columns Date ● Customer Product Price Quantity Column-oriented storage – Read only columns needed for query processing Date 30th May 2013 Customer Product Price DWS in the Cloud, AICCSA'13, Fez Quantity 11
  • 12. Data Warehouse Systems --Common Optimizations: Columnar Storage Benefits ● ● ● Allows best data compression rate, since data values are redundant within a single column, Eliminates unnecessary I/O through the retrieval of only relevant data Vectorwise is in the TPC-H - Top Ten Performance Results (14-Jun-2013) 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 12
  • 13. Data Warehouse Systems --Common Optimizations: Derived Data ● Derived Data: – – Derived Attributes, – ● Indexes, Aggregate tables Pros: – ● High Performance Cons: – Maintenance: refresh is expensive – Storage cost 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 13
  • 14. Data Warehouse Systems --DWS Benchmarks ● APB-1 OLAP Benchmark --obsolete – – ● Released by the OLAP Council (www.olapcouncil.org) in 1998 A simple star schema data model TPC DSS Benchmark – Released by the Transaction Processing Council (www.tpc.org) – Examine large volumes of data (from 10GB to 100TB) – Complex relational data model – TPC-H Workload composed of 22 ad-hoc complex SQL Statements ● The most prominent DSS benchmark TPC-DS -successor of TPC-H ● – ● ● 30th May 2013 Workload composed of a 99 SQL business questions Same metrics than TPC-H DWS in the Cloud, AICCSA'13, Fez 14
  • 15. Data Warehouse Systems --TPC-H Benchmark Metrics (same for TPC-DS) ● Query-per-hour Performance Metric – – ● For a given scale factor (warehouse data volume) Concurrent users Price-Performance Metric – 30th May 2013 Ratio of Priced System (cost of ownership: hardware, software, maintenance, and cost of everything needed to run the TPC6H workload) to Query performance Metric DWS in the Cloud, AICCSA'13, Fez 15
  • 16. Data Warehouse Systems --TPC-H mismatches Cloud Rationale ● TPC-H Does not represent BI suites – – Analytics services (Multi-dimensional Language, Mining Structures) – ● Integration services Reporting services eXpressions TPC-H Workload Processing Metric – Qph@Size defines the number of queries processed by hour – The workload is assumed static, which is not realistic! – The benchmark should assess the SUT scalability under variable and evolving workload and data volumes 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 16
  • 17. Data Warehouse Systems --TPC-H mismatches Cloud Rationale (ctnd.1) ● TPC-H Cost-Performance Metric – $/Qph@Size, where the cost relates to all of hardware, software and HR required for running the workload (3yrs) – The cost model in the cloud is different, and does relate to the cost of ownership ● TPC-H does not report a Cost-Effectiveness Metric ● not TPC-H implementation vs. CAP theorem – CAP theorem: A distributed system can not fulfill both Consistency (same view of data), Availability (query response) and Partition Tolerance (cope with hardware crash). – Since DWS deployments are onto shared-nothing architectures, benchmarks should be either CA, CP and AP-compliant. 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 17
  • 18. New Requirements & New Metrics NewRequirements & New Metrics 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 18
  • 19. High Performance Requirement High Performance Requirement --Data Transfer IN/ OUT CSP ● Data Transfer Characteristics – Huge data volumes transfer IN and OUT the Cloud Service Provider – Resulting in Network-bound DWS – Usually, the cost model adopted by CSPs is: ● ● ● Data upload IN the CSP is free of charge Data download OUT the CSP is priced Data Transfer Metrics in the Cloud – – 30th May 2013 Time and cost for data upload Time and cost for data download DWS in the Cloud, AICCSA'13, Fez 19
  • 20. High Performance Requirement High Performance (ctnd. 1) Requirement --Workload Processing ● Workload Processing Characteristics – – ● Both I/O-bound and CPU-bound business questions Intra-query processing combined with virtual partitioning or physical processing Performance across Cluster Size – – 30th May 2013 For each business question, there is an optimum response time for a particular cluster size and performance degrades from this optimum onward and backward Proved for both SQL and NoSQL technologies DWS in the Cloud, AICCSA'13, Fez 20
  • 21. High Performance Requirement High Performance (ctnd.2) Requirement --Workload Processing ● 30th May 2013 TPC-H benchmarking of Apache Hadoop/Pig Latin on GRID5000 -Bordeaux Site [Moussa,ICCIT'12] (SF=10) DWS in the Cloud, AICCSA'13, Fez 21
  • 22. High Performance Requirement High Performance (ctnd.3) Requirement --Workload Processing ● Workload Processing Metrics – – 30th May 2013 Elapsed times for running business questions, Slope: performance - cost DWS in the Cloud, AICCSA'13, Fez 22
  • 23. Scalability Requirement ● Definition – ● Scalability is the ability of a system to increase total throughput under an increased load when hardware resources are added.. Scalability Metric – Query Performance Metric under ● ● 30th May 2013 Ever increasing workload Different query frequencies DWS in the Cloud, AICCSA'13, Fez 23
  • 24. Elasticity Requirement ● Definition – ● Elasticity adjusts the system capacity at runtime by adding and removing resources without service interruption in order to handle the workload variation. Elasticity Metric – – Scaling Latency: elapsed time to scale-down and scale-up – Impact on SUT performances during scale-up and scale-down – Scale-up cost (+$) – 30th May 2013 Capacity to add/remove resources: (0|1) Scale-down gain (-$) DWS in the Cloud, AICCSA'13, Fez 24
  • 25. High Availability Requirement –- Redundancy Strategies ● Redundancy Strategies – – ● Replication (a.k.a. mirroring) Erasure-Resilient Codes Redundancy Strategies vs. Workload Type – – ● Replication suits OLTP workload Erasure-resilient codes suits OLAP workload Comparison [Litwin et al.,ACM TODS'05] – – Computation cost – 30th May 2013 Data storage cost Communication cost DWS in the Cloud, AICCSA'13, Fez 25
  • 26. High Availability Requirement –-Strategies Comparison (ctnd.1) 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 26
  • 27. High Availability Requirement --Metrics for the Cloud (ctnd.2) ● High Availability Metrics – $@k: Cost of different targeted levels of availabilities (1-available, . . . , k-available, i.e. the number of failures the system can tolerate). – Cost of recovery expressed ● ● 30th May 2013 Time to get system back Decreased system productivity caused by the hardware failure ($) from customer perspective DWS in the Cloud, AICCSA'13, Fez 27
  • 28. Cost Management Requirement ● CSP price cost model – Different cloud service price models (IaaS, PaaS, SaaS) – e.g. CPU cost for IaaS: Instance based (Amazon, MS Azur) or CPU-cycles based (Cloud Sites, Google App Engine) ● Query processing by Google BigQuery is based on retrieved bytes (columnar storage) Cost-Performance Ratio ● ● ● 30th May 2013 Cost-Effectiveness ratio DWS in the Cloud, AICCSA'13, Fez 28
  • 29. Related Work ● Benchmarking in the cloud – [Gray,MS'08]: Terasoft Benchmark for data sort evaluations, – [Cooper et al., SoCC'10]: Yahoo Cloud Serving Benchmark (YCSB) for evaluating the performance of "key-value" and "cloud" serving stores. – [Sobel et al., ICCSA'08]: CloudStone Benchmark for Web2.0 applications – [Bennet et al., KDD'10]: MalStone Benchmarking for data mining in the cloud – [Ang et al., USENIX'10]: CloudCMP project for CSP comparison – [Binnig et al., DBTest'09], [Kossmann et al., SIGMOD'10]: Benchmarking OLTP systems in the cloud ● 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 29
  • 30. Related Work (ctnd.1) ● NoSQL and SQL Technologies Assessment in the cloud – – ● [Pavlo et al. SIGMOD'09], [Floratou et al., TPC-TC'11 ], More Specific Issues – [Forrester, 2011]: Storage on-premises vs. in the cloud – [Nguyen et al., EDBT Workshops'12]: Materialized Views Selection – [Moussa, IJWA'12]: OLAP Scenarios in the Cloud and OLAP Workload Texonomy 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 30
  • 31. Conclusion & Future Work ● Keynote scope – Overview of DWS – Insight of new requirements and new metrics to be considered for benchmarking DWS in the cloud [Moussa, AICCSA'13] ● Research Perspectives – Assessment of OLAP systems in the cloud e ● ● ● ● 30th May 2013 Amazon RDS Google BigQuery MS Azure ... DWS in the Cloud, AICCSA'13, Fez 31
  • 32. Research Perspectives --New OLTP Systems ● Classical Workload Taxonomy – – ● OLTP: Transactions, ACID properties OLAP: complex queries, star-joins, grouping, aggregations... New OLTP Workload features: – – Big Data – ● OLTP Real-time analytics Examples of systems: Google Spanner, Clustrix, NuoDB and TransLattice 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 34
  • 33. Thank you for Your Attention Q&A ? Rim Moussa Data Warehouse Systems in the Cloud N2C'2013, Hammamet 30th May 2013 15th June 2013 DWS in the Cloud, AICCSA'13, Fez 35
  • 34. Data Warehouse Systems --TPC-H Benchmark Relational DB Schema 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 36
  • 35. Data Warehouse Systems --TPC-H Benchmark Metrics 30th May 2013 DWS in the Cloud, AICCSA'13, Fez 37