SlideShare une entreprise Scribd logo
1  sur  23
Data WarehousingData Warehousing
11
Data WarehousingData Warehousing
Lecture-25Lecture-25
Need for Speed: Parallelism MethodologiesNeed for Speed: Parallelism Methodologies
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan1010@yahoo.com
Data Warehousing
2
MotivationMotivation
 No need of parallelism if perfect computerNo need of parallelism if perfect computer
 with single infinitely fast processorwith single infinitely fast processor
 with an infinite memory with infinite bandwidthwith an infinite memory with infinite bandwidth
 and its infinitely cheap too (free!)and its infinitely cheap too (free!)
 Technology is not delivering (going to Moon analogy)Technology is not delivering (going to Moon analogy)
 The Challenge is to buildThe Challenge is to build
 infinitely fast processor out of infinitely manyinfinitely fast processor out of infinitely many
processors ofprocessors of finite speedfinite speed
 Infinitely large memory with infinite memoryInfinitely large memory with infinite memory
bandwidth from infinite manybandwidth from infinite many finite storage unitsfinite storage units ofof
finite speedfinite speed
No text goes to graphics
Data Warehousing
3
Data Parallelism: ConceptData Parallelism: Concept
 Parallel execution of a single data manipulationParallel execution of a single data manipulation
task across multiple partitions of data.task across multiple partitions of data.
 Partitions static or dynamicPartitions static or dynamic
 Tasks executed almost-independently acrossTasks executed almost-independently across
partitions.partitions.
 ““Query coordinator” must coordinate between theQuery coordinator” must coordinate between the
independently executing processes.independently executing processes.
No text goes to graphics
Data Warehousing
4
Data Parallelism: ExampleData Parallelism: Example
Emp Table
Partition 1Partition-1
Partition-2
Partition-k
.
.
.
62
440
1,123
Query
Server-1
Query
Server-2
Query
Server-k
.
.
.
Query
Coordinator
Select count (*)
from Emp
where age > 50
AND
sal > 10,000’;
Ans = 62 + 440 + ... + 1,123 = 99,000
Data Warehousing
5
To get a speed-up of N with N partitions, it must beTo get a speed-up of N with N partitions, it must be
ensured that:ensured that:
 There are enough computing resources.There are enough computing resources.
 Query-coordinator is very fast as compared to queryQuery-coordinator is very fast as compared to query
servers.servers.
 Work done in each partition almost same to avoidWork done in each partition almost same to avoid
performance bottlenecks.performance bottlenecks.
 Same number of records in each partition would notSame number of records in each partition would not
suffice.suffice.
 Need to have uniform distribution of records w.r.t filterNeed to have uniform distribution of records w.r.t filter
criterion across partitions.criterion across partitions.
Data Parallelism: Ensuring Speed-UPData Parallelism: Ensuring Speed-UP
No text will go to graphics
Data Warehousing
6
Temporal Parallelism (pipelining)Temporal Parallelism (pipelining)
Involves taking a complex task and breaking it down intoInvolves taking a complex task and breaking it down into
independentindependent subtasks for parallel execution on a streamsubtasks for parallel execution on a stream
of data inputs.of data inputs.
Time = T/3 Time = T/3 Time = T/3
[] [] [][]
Task Execution Time = T
[] [] [] [] [] []
No text goes to graphics
Data Warehousing
7
Pipelining: Time ChartPipelining: Time Chart
Time = T/3
[][]
Time = T/3 Time = T/3
Time = T/3
[][]
Time = T/3 Time = T/3
Time = T/3
[]
Time = T/3 Time = T/3
T = 0 T = 1 T = 2
Time = T/3
[]
Time = T/3
T = 3
Data Warehousing
8
Pipelining: Speed-Up CalculationPipelining: Speed-Up Calculation
Time for sequential execution of 1 taskTime for sequential execution of 1 task = T= T
Time for sequential execution of N tasks = N * TTime for sequential execution of N tasks = N * T
(Ideal) time for pipelined execution of one task using an M stage pipeline(Ideal) time for pipelined execution of one task using an M stage pipeline
= T= T
(Ideal) time for pipelined execution of N tasks using an M stage pipeline(Ideal) time for pipelined execution of N tasks using an M stage pipeline
= T + ((N-1)= T + ((N-1) ×× (T/M))(T/M))
Speed-up (S) =Speed-up (S) =
Pipeline parallelism focuses on increasingPipeline parallelism focuses on increasing throughputthroughput of task execution,of task execution,
NOT on decreasing sub-taskNOT on decreasing sub-task execution timeexecution time..
Data Warehousing
9
Example: Bottling soft drinks in a factoryExample: Bottling soft drinks in a factory
1010 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 10= 10 ×× TT
Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (10-1)/3 = 4(10-1)/3 = 4 ×× TT
Speed-up = 2.50Speed-up = 2.50
2020 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 20= 20 ×× TT
Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (20-1)/3 = 7.3(20-1)/3 = 7.3 ×× TT
Speed-up = 2.72Speed-up = 2.72
4040 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 40= 40 ×× TT
Fill bottle, Seal bottle, Label Bottle pipeline = T + TFill bottle, Seal bottle, Label Bottle pipeline = T + T ×× (40-1)/3 = 14.0(40-1)/3 = 14.0 ×× TT
Speed-up = 2.85Speed-up = 2.85
Pipelining: Speed-Up ExamplePipelining: Speed-Up Example
Only 1st
two examples will go to graphics
Data Warehousing
10
Pipelining: Input vs Speed-UpPipelining: Input vs Speed-Up
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Input (N)
Speed-up(S)
Asymptotic limit on speed-up for M stage pipeline is M.Asymptotic limit on speed-up for M stage pipeline is M.
The speed-up will NEVER be M, as initially filling theThe speed-up will NEVER be M, as initially filling the
pipeline took T time units.pipeline took T time units.
Data Warehousing
11
Pipelining: LimitationsPipelining: Limitations
 Relational pipelines are rarely very longRelational pipelines are rarely very long
 Even a chain of length ten is unusual.Even a chain of length ten is unusual.
 Some relational operators do not produce firstSome relational operators do not produce first
output until consumed all their inputs.output until consumed all their inputs.
 Aggregate and sort operators have this property. OneAggregate and sort operators have this property. One
cannot pipeline these operators.cannot pipeline these operators.
 Often, execution cost of one operator is muchOften, execution cost of one operator is much
greater than others hence skew.greater than others hence skew.
 e.g. Sum() or count() vs Group-by() or Join.e.g. Sum() or count() vs Group-by() or Join.
No text goes to graphics
Data Warehousing
12
Partitioning & QueriesPartitioning & Queries
 Let’s evaluate how well different partitioningLet’s evaluate how well different partitioning
techniques support the following types oftechniques support the following types of
data access:data access:
 Full Table Scan:Full Table Scan: Scanning the entire relationScanning the entire relation
 Point Queries:Point Queries: Locating a tuple, e.g. whereLocating a tuple, e.g. where r.Ar.A
= 313= 313
 Range Queries:Range Queries: Locating all tuples such thatLocating all tuples such that
the value of a given attribute lies within athe value of a given attribute lies within a
specified range. e.g., where 313specified range. e.g., where 313 ≤≤ r.Ar.A < 786.< 786.
yellow goes to graphics
Data Warehousing
13
Round RobinRound Robin
 AdvantagesAdvantages
 Best suited for sequential scan of entireBest suited for sequential scan of entire
relation on each query.relation on each query.
 All disks have almost an equal number ofAll disks have almost an equal number of
tuples; retrieval work is thus well balancedtuples; retrieval work is thus well balanced
between disks.between disks.
 Range queries are difficult to processRange queries are difficult to process
 No clustering -- tuples are scattered acrossNo clustering -- tuples are scattered across
all disksall disks
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
14
Hash PartitioningHash Partitioning
 Good for sequential accessGood for sequential access
 With uniform hashing and using partitioning attributes asWith uniform hashing and using partitioning attributes as
a key, tuples will be equally distributed between disks.a key, tuples will be equally distributed between disks.
 Good for point queries on partitioning attributeGood for point queries on partitioning attribute
 Can lookup single disk, leaving others available forCan lookup single disk, leaving others available for
answering other queries.answering other queries.
 Index on partitioning attribute can be local to disk, makingIndex on partitioning attribute can be local to disk, making
lookup and update very efficient even joins.lookup and update very efficient even joins.
• Range queries are difficult to processRange queries are difficult to process
No clustering -- tuples are scattered across allNo clustering -- tuples are scattered across all
disksdisks
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
15
Range PartitioningRange Partitioning
 Provides data clustering by partitioning attribute value.Provides data clustering by partitioning attribute value.
 Good for sequential accessGood for sequential access
 Good for point queries on partitioning attribute: only oneGood for point queries on partitioning attribute: only one
disk needs to be accessed.disk needs to be accessed.
 For range queries on partitioning attribute, one or a fewFor range queries on partitioning attribute, one or a few
disks may need to be accesseddisks may need to be accessed
− Remaining disks are available for other queries.Remaining disks are available for other queries.
− Good if result tuples are from one to a few blocks.Good if result tuples are from one to a few blocks.
− If many blocks are to be fetched, they are still fetched from one to aIf many blocks are to be fetched, they are still fetched from one to a
few disks, then potential parallelism in disk access is wastedfew disks, then potential parallelism in disk access is wasted
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
16
Parallel SortingParallel Sorting
 Scan in parallel, and range partition on the go.Scan in parallel, and range partition on the go.
 As partitioned data becomes available, performAs partitioned data becomes available, perform
“local” sorting.“local” sorting.
 Resulting data is sorted and again range partitioned.Resulting data is sorted and again range partitioned.
 Problem:Problem: skew or “hot spot”.skew or “hot spot”.
 Solution:Solution: Sample the data at start to determineSample the data at start to determine
partition pointspartition points.
data
Processors
1 2 3 4 5
Hot spot
P1 P2 P3 P4 P5
1 4 1 2 1
Data Warehousing
17
Skew in PartitioningSkew in Partitioning
 The distribution of tuples to disks may beThe distribution of tuples to disks may be skewedskewed
 i.e. some disks have many tuples, while others may have fewer tuples.i.e. some disks have many tuples, while others may have fewer tuples.
 Types of skew:Types of skew:
 Attribute-value skew.Attribute-value skew.
 Some values appear in the partitioning attributes of many tuples; allSome values appear in the partitioning attributes of many tuples; all
the tuples with the same value for the partitioning attribute end up inthe tuples with the same value for the partitioning attribute end up in
the same partition.the same partition.
 Can occur with range-partitioning and hash-partitioning.Can occur with range-partitioning and hash-partitioning.
 Partition skewPartition skew..
 With range-partitioning, badly chosen partition vector may assignWith range-partitioning, badly chosen partition vector may assign
too many tuples to some partitions and too few to others.too many tuples to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.Less likely with hash-partitioning if a good hash-function is chosen.
yellow goes to graphics
Data Warehousing
18
Handling Skew in Range-PartitioningHandling Skew in Range-Partitioning
 To create a balanced partitioning vectorTo create a balanced partitioning vector
 SortSort the relation on the partitioning attribute.the relation on the partitioning attribute.
 Construct the partition vectorConstruct the partition vector by scanning theby scanning the
relation in sorted order as follows.relation in sorted order as follows.
 After every 1/After every 1/nnthth
of the relation has been read, the value ofof the relation has been read, the value of
the partitioning attribute of the next tuple is added to thethe partitioning attribute of the next tuple is added to the
partition vector.partition vector.
 nn denotes the number of partitions to be constructed.denotes the number of partitions to be constructed.
 Duplicate entries or imbalancesDuplicate entries or imbalances can result ifcan result if
duplicates are present in partitioning attributes.duplicates are present in partitioning attributes.
yellow goes to graphics
Data Warehousing
19
Barriers to Linear Speedup & Scale-upBarriers to Linear Speedup & Scale-up
 Amdahal’ LawAmdahal’ Law
 StartupStartup
 Time needed to start a large number of processors.Time needed to start a large number of processors.
 Increase with increase in number of individual processors.Increase with increase in number of individual processors.
 May also include time spent in opening files etc.May also include time spent in opening files etc.
 InterferenceInterference
 Slow down that each processor imposes on all others when sharing aSlow down that each processor imposes on all others when sharing a
common pool of resources “(e.g. memory).common pool of resources “(e.g. memory).
 SkewSkew
 Variance dominating the mean.Variance dominating the mean.
 Service time of the job is service time of its slowest components.Service time of the job is service time of its slowest components.
yellow goes to graphics
Data Warehousing
20
Comparison of Partitioning TechniquesComparison of Partitioning Techniques
Shared disk/memory less sensitive to partitioning.
Shared nothing can benefit from good partitioning.
A…E F…J K…NO…S T…Z
Range
Good for equijoins, range
queries, group-by clauses,
can result in “hot spots”.
UsersUsers
A…E F…J K…NO…S T…Z
Round Robin
Good for load balancing,
but impervious to nature of
queries.
UsersUsers
A…E F…J K…NO…S T…Z
Hash
Good for equijoins, can
results in uneven data
distribution
UsersUsers
Data Warehousing
21
Parallel AggregatesParallel Aggregates
For each aggregate function, need a decomposition:
Count(S) = Σ count(s1) + Σ count(s2) + ….
Average(S) = Σ Avg(s1) + Σ Avg(s2) + ….
For groups:
Distribute data using hashing.
Sub aggregate groups close to the source.
Pass each sub-aggregate to its group’s site.
A…E F…J K…NO…S T…Z
Data Warehousing
22
 When to use Range Partitioning?When to use Range Partitioning?
 When to Use Hash Partitioning?When to Use Hash Partitioning?
 When to Use List Partitioning?When to Use List Partitioning?
 When to use Round-Robin Partitioning?When to use Round-Robin Partitioning?
When to use which partitioning Tech?When to use which partitioning Tech?
Data Warehousing
23
Parallelism Goals and MetricsParallelism Goals and Metrics
 Speedup: TheSpeedup: The GoodGood, The, The BadBad & The& The UglyUgly
OldTime
NewTimeSpeedup=
Processors & Discs
The ideal
Speedup Curve
Linearity
 Scale-up:Scale-up:
 Transactional Scale-up: Fit for OLTP systemsTransactional Scale-up: Fit for OLTP systems
 Batch Scale-up: Fit for Data Warehouse and OLAPBatch Scale-up: Fit for Data Warehouse and OLAP
Processors & Discs
A Bad Speedup Curve
Non-linear
Min Parallelism
Benefit
Processors & Discs
A Bad Speedup Curve
3-Factors
Startup
Interference
Skew

Contenu connexe

Tendances

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)Xavier Constant
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...Franck Pachot
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop TechnologyManish Borkar
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkeldariof
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1Giovanna Roda
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopGiovanna Roda
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsAsad Masood Qazi
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reducePaladion Networks
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-servicesSreenu Musham
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingZhe Zhang
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsbarbie0909
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep divet3rmin4t0r
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce introGeoff Hendrey
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseAlexander Talac
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 

Tendances (20)

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 

Similaire à Lecture 25

ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSchandugoswami
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Intel Software Brasil
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureOllieShoresna
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure Eman magdy
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxShivamKrPathak
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit
 
The Need for Async @ ScalaWorld
The Need for Async @ ScalaWorldThe Need for Async @ ScalaWorld
The Need for Async @ ScalaWorldKonrad Malawski
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...AntareepMajumder
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkDatabricks
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelJenny Liu
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...NoSQLmatters
 
Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.pptAlpha474815
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.pptSagarDR5
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupJustin Borgman
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 

Similaire à Lecture 25 (20)

ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data Structure
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
The Need for Async @ ScalaWorld
The Need for Async @ ScalaWorldThe Need for Async @ ScalaWorld
The Need for Async @ ScalaWorld
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 

Plus de Shani729

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012Shani729
 
Python tutorial
Python tutorialPython tutorial
Python tutorialShani729
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionShani729
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)Shani729
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15Shani729
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodShani729
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10Shani729
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Shani729
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Shani729
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Shani729
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2Shani729
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1Shani729
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13Shani729
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Shani729
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Lecture 40
Lecture 40Lecture 40
Lecture 40Shani729
 
Lecture 39
Lecture 39Lecture 39
Lecture 39Shani729
 
Lecture 38
Lecture 38Lecture 38
Lecture 38Shani729
 
Lecture 37
Lecture 37Lecture 37
Lecture 37Shani729
 

Plus de Shani729 (20)

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 

Dernier

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 

Dernier (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Lecture 25

  • 1. Data WarehousingData Warehousing 11 Data WarehousingData Warehousing Lecture-25Lecture-25 Need for Speed: Parallelism MethodologiesNeed for Speed: Parallelism Methodologies Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan1010@yahoo.com
  • 2. Data Warehousing 2 MotivationMotivation  No need of parallelism if perfect computerNo need of parallelism if perfect computer  with single infinitely fast processorwith single infinitely fast processor  with an infinite memory with infinite bandwidthwith an infinite memory with infinite bandwidth  and its infinitely cheap too (free!)and its infinitely cheap too (free!)  Technology is not delivering (going to Moon analogy)Technology is not delivering (going to Moon analogy)  The Challenge is to buildThe Challenge is to build  infinitely fast processor out of infinitely manyinfinitely fast processor out of infinitely many processors ofprocessors of finite speedfinite speed  Infinitely large memory with infinite memoryInfinitely large memory with infinite memory bandwidth from infinite manybandwidth from infinite many finite storage unitsfinite storage units ofof finite speedfinite speed No text goes to graphics
  • 3. Data Warehousing 3 Data Parallelism: ConceptData Parallelism: Concept  Parallel execution of a single data manipulationParallel execution of a single data manipulation task across multiple partitions of data.task across multiple partitions of data.  Partitions static or dynamicPartitions static or dynamic  Tasks executed almost-independently acrossTasks executed almost-independently across partitions.partitions.  ““Query coordinator” must coordinate between theQuery coordinator” must coordinate between the independently executing processes.independently executing processes. No text goes to graphics
  • 4. Data Warehousing 4 Data Parallelism: ExampleData Parallelism: Example Emp Table Partition 1Partition-1 Partition-2 Partition-k . . . 62 440 1,123 Query Server-1 Query Server-2 Query Server-k . . . Query Coordinator Select count (*) from Emp where age > 50 AND sal > 10,000’; Ans = 62 + 440 + ... + 1,123 = 99,000
  • 5. Data Warehousing 5 To get a speed-up of N with N partitions, it must beTo get a speed-up of N with N partitions, it must be ensured that:ensured that:  There are enough computing resources.There are enough computing resources.  Query-coordinator is very fast as compared to queryQuery-coordinator is very fast as compared to query servers.servers.  Work done in each partition almost same to avoidWork done in each partition almost same to avoid performance bottlenecks.performance bottlenecks.  Same number of records in each partition would notSame number of records in each partition would not suffice.suffice.  Need to have uniform distribution of records w.r.t filterNeed to have uniform distribution of records w.r.t filter criterion across partitions.criterion across partitions. Data Parallelism: Ensuring Speed-UPData Parallelism: Ensuring Speed-UP No text will go to graphics
  • 6. Data Warehousing 6 Temporal Parallelism (pipelining)Temporal Parallelism (pipelining) Involves taking a complex task and breaking it down intoInvolves taking a complex task and breaking it down into independentindependent subtasks for parallel execution on a streamsubtasks for parallel execution on a stream of data inputs.of data inputs. Time = T/3 Time = T/3 Time = T/3 [] [] [][] Task Execution Time = T [] [] [] [] [] [] No text goes to graphics
  • 7. Data Warehousing 7 Pipelining: Time ChartPipelining: Time Chart Time = T/3 [][] Time = T/3 Time = T/3 Time = T/3 [][] Time = T/3 Time = T/3 Time = T/3 [] Time = T/3 Time = T/3 T = 0 T = 1 T = 2 Time = T/3 [] Time = T/3 T = 3
  • 8. Data Warehousing 8 Pipelining: Speed-Up CalculationPipelining: Speed-Up Calculation Time for sequential execution of 1 taskTime for sequential execution of 1 task = T= T Time for sequential execution of N tasks = N * TTime for sequential execution of N tasks = N * T (Ideal) time for pipelined execution of one task using an M stage pipeline(Ideal) time for pipelined execution of one task using an M stage pipeline = T= T (Ideal) time for pipelined execution of N tasks using an M stage pipeline(Ideal) time for pipelined execution of N tasks using an M stage pipeline = T + ((N-1)= T + ((N-1) ×× (T/M))(T/M)) Speed-up (S) =Speed-up (S) = Pipeline parallelism focuses on increasingPipeline parallelism focuses on increasing throughputthroughput of task execution,of task execution, NOT on decreasing sub-taskNOT on decreasing sub-task execution timeexecution time..
  • 9. Data Warehousing 9 Example: Bottling soft drinks in a factoryExample: Bottling soft drinks in a factory 1010 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 10= 10 ×× TT Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (10-1)/3 = 4(10-1)/3 = 4 ×× TT Speed-up = 2.50Speed-up = 2.50 2020 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 20= 20 ×× TT Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (20-1)/3 = 7.3(20-1)/3 = 7.3 ×× TT Speed-up = 2.72Speed-up = 2.72 4040 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 40= 40 ×× TT Fill bottle, Seal bottle, Label Bottle pipeline = T + TFill bottle, Seal bottle, Label Bottle pipeline = T + T ×× (40-1)/3 = 14.0(40-1)/3 = 14.0 ×× TT Speed-up = 2.85Speed-up = 2.85 Pipelining: Speed-Up ExamplePipelining: Speed-Up Example Only 1st two examples will go to graphics
  • 10. Data Warehousing 10 Pipelining: Input vs Speed-UpPipelining: Input vs Speed-Up 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Input (N) Speed-up(S) Asymptotic limit on speed-up for M stage pipeline is M.Asymptotic limit on speed-up for M stage pipeline is M. The speed-up will NEVER be M, as initially filling theThe speed-up will NEVER be M, as initially filling the pipeline took T time units.pipeline took T time units.
  • 11. Data Warehousing 11 Pipelining: LimitationsPipelining: Limitations  Relational pipelines are rarely very longRelational pipelines are rarely very long  Even a chain of length ten is unusual.Even a chain of length ten is unusual.  Some relational operators do not produce firstSome relational operators do not produce first output until consumed all their inputs.output until consumed all their inputs.  Aggregate and sort operators have this property. OneAggregate and sort operators have this property. One cannot pipeline these operators.cannot pipeline these operators.  Often, execution cost of one operator is muchOften, execution cost of one operator is much greater than others hence skew.greater than others hence skew.  e.g. Sum() or count() vs Group-by() or Join.e.g. Sum() or count() vs Group-by() or Join. No text goes to graphics
  • 12. Data Warehousing 12 Partitioning & QueriesPartitioning & Queries  Let’s evaluate how well different partitioningLet’s evaluate how well different partitioning techniques support the following types oftechniques support the following types of data access:data access:  Full Table Scan:Full Table Scan: Scanning the entire relationScanning the entire relation  Point Queries:Point Queries: Locating a tuple, e.g. whereLocating a tuple, e.g. where r.Ar.A = 313= 313  Range Queries:Range Queries: Locating all tuples such thatLocating all tuples such that the value of a given attribute lies within athe value of a given attribute lies within a specified range. e.g., where 313specified range. e.g., where 313 ≤≤ r.Ar.A < 786.< 786. yellow goes to graphics
  • 13. Data Warehousing 13 Round RobinRound Robin  AdvantagesAdvantages  Best suited for sequential scan of entireBest suited for sequential scan of entire relation on each query.relation on each query.  All disks have almost an equal number ofAll disks have almost an equal number of tuples; retrieval work is thus well balancedtuples; retrieval work is thus well balanced between disks.between disks.  Range queries are difficult to processRange queries are difficult to process  No clustering -- tuples are scattered acrossNo clustering -- tuples are scattered across all disksall disks Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 14. Data Warehousing 14 Hash PartitioningHash Partitioning  Good for sequential accessGood for sequential access  With uniform hashing and using partitioning attributes asWith uniform hashing and using partitioning attributes as a key, tuples will be equally distributed between disks.a key, tuples will be equally distributed between disks.  Good for point queries on partitioning attributeGood for point queries on partitioning attribute  Can lookup single disk, leaving others available forCan lookup single disk, leaving others available for answering other queries.answering other queries.  Index on partitioning attribute can be local to disk, makingIndex on partitioning attribute can be local to disk, making lookup and update very efficient even joins.lookup and update very efficient even joins. • Range queries are difficult to processRange queries are difficult to process No clustering -- tuples are scattered across allNo clustering -- tuples are scattered across all disksdisks Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 15. Data Warehousing 15 Range PartitioningRange Partitioning  Provides data clustering by partitioning attribute value.Provides data clustering by partitioning attribute value.  Good for sequential accessGood for sequential access  Good for point queries on partitioning attribute: only oneGood for point queries on partitioning attribute: only one disk needs to be accessed.disk needs to be accessed.  For range queries on partitioning attribute, one or a fewFor range queries on partitioning attribute, one or a few disks may need to be accesseddisks may need to be accessed − Remaining disks are available for other queries.Remaining disks are available for other queries. − Good if result tuples are from one to a few blocks.Good if result tuples are from one to a few blocks. − If many blocks are to be fetched, they are still fetched from one to aIf many blocks are to be fetched, they are still fetched from one to a few disks, then potential parallelism in disk access is wastedfew disks, then potential parallelism in disk access is wasted Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 16. Data Warehousing 16 Parallel SortingParallel Sorting  Scan in parallel, and range partition on the go.Scan in parallel, and range partition on the go.  As partitioned data becomes available, performAs partitioned data becomes available, perform “local” sorting.“local” sorting.  Resulting data is sorted and again range partitioned.Resulting data is sorted and again range partitioned.  Problem:Problem: skew or “hot spot”.skew or “hot spot”.  Solution:Solution: Sample the data at start to determineSample the data at start to determine partition pointspartition points. data Processors 1 2 3 4 5 Hot spot P1 P2 P3 P4 P5 1 4 1 2 1
  • 17. Data Warehousing 17 Skew in PartitioningSkew in Partitioning  The distribution of tuples to disks may beThe distribution of tuples to disks may be skewedskewed  i.e. some disks have many tuples, while others may have fewer tuples.i.e. some disks have many tuples, while others may have fewer tuples.  Types of skew:Types of skew:  Attribute-value skew.Attribute-value skew.  Some values appear in the partitioning attributes of many tuples; allSome values appear in the partitioning attributes of many tuples; all the tuples with the same value for the partitioning attribute end up inthe tuples with the same value for the partitioning attribute end up in the same partition.the same partition.  Can occur with range-partitioning and hash-partitioning.Can occur with range-partitioning and hash-partitioning.  Partition skewPartition skew..  With range-partitioning, badly chosen partition vector may assignWith range-partitioning, badly chosen partition vector may assign too many tuples to some partitions and too few to others.too many tuples to some partitions and too few to others.  Less likely with hash-partitioning if a good hash-function is chosen.Less likely with hash-partitioning if a good hash-function is chosen. yellow goes to graphics
  • 18. Data Warehousing 18 Handling Skew in Range-PartitioningHandling Skew in Range-Partitioning  To create a balanced partitioning vectorTo create a balanced partitioning vector  SortSort the relation on the partitioning attribute.the relation on the partitioning attribute.  Construct the partition vectorConstruct the partition vector by scanning theby scanning the relation in sorted order as follows.relation in sorted order as follows.  After every 1/After every 1/nnthth of the relation has been read, the value ofof the relation has been read, the value of the partitioning attribute of the next tuple is added to thethe partitioning attribute of the next tuple is added to the partition vector.partition vector.  nn denotes the number of partitions to be constructed.denotes the number of partitions to be constructed.  Duplicate entries or imbalancesDuplicate entries or imbalances can result ifcan result if duplicates are present in partitioning attributes.duplicates are present in partitioning attributes. yellow goes to graphics
  • 19. Data Warehousing 19 Barriers to Linear Speedup & Scale-upBarriers to Linear Speedup & Scale-up  Amdahal’ LawAmdahal’ Law  StartupStartup  Time needed to start a large number of processors.Time needed to start a large number of processors.  Increase with increase in number of individual processors.Increase with increase in number of individual processors.  May also include time spent in opening files etc.May also include time spent in opening files etc.  InterferenceInterference  Slow down that each processor imposes on all others when sharing aSlow down that each processor imposes on all others when sharing a common pool of resources “(e.g. memory).common pool of resources “(e.g. memory).  SkewSkew  Variance dominating the mean.Variance dominating the mean.  Service time of the job is service time of its slowest components.Service time of the job is service time of its slowest components. yellow goes to graphics
  • 20. Data Warehousing 20 Comparison of Partitioning TechniquesComparison of Partitioning Techniques Shared disk/memory less sensitive to partitioning. Shared nothing can benefit from good partitioning. A…E F…J K…NO…S T…Z Range Good for equijoins, range queries, group-by clauses, can result in “hot spots”. UsersUsers A…E F…J K…NO…S T…Z Round Robin Good for load balancing, but impervious to nature of queries. UsersUsers A…E F…J K…NO…S T…Z Hash Good for equijoins, can results in uneven data distribution UsersUsers
  • 21. Data Warehousing 21 Parallel AggregatesParallel Aggregates For each aggregate function, need a decomposition: Count(S) = Σ count(s1) + Σ count(s2) + …. Average(S) = Σ Avg(s1) + Σ Avg(s2) + …. For groups: Distribute data using hashing. Sub aggregate groups close to the source. Pass each sub-aggregate to its group’s site. A…E F…J K…NO…S T…Z
  • 22. Data Warehousing 22  When to use Range Partitioning?When to use Range Partitioning?  When to Use Hash Partitioning?When to Use Hash Partitioning?  When to Use List Partitioning?When to Use List Partitioning?  When to use Round-Robin Partitioning?When to use Round-Robin Partitioning? When to use which partitioning Tech?When to use which partitioning Tech?
  • 23. Data Warehousing 23 Parallelism Goals and MetricsParallelism Goals and Metrics  Speedup: TheSpeedup: The GoodGood, The, The BadBad & The& The UglyUgly OldTime NewTimeSpeedup= Processors & Discs The ideal Speedup Curve Linearity  Scale-up:Scale-up:  Transactional Scale-up: Fit for OLTP systemsTransactional Scale-up: Fit for OLTP systems  Batch Scale-up: Fit for Data Warehouse and OLAPBatch Scale-up: Fit for Data Warehouse and OLAP Processors & Discs A Bad Speedup Curve Non-linear Min Parallelism Benefit Processors & Discs A Bad Speedup Curve 3-Factors Startup Interference Skew

Notes de l'éditeur

  1. &amp;lt;number&amp;gt;
  2. &amp;lt;number&amp;gt;
  3. &amp;lt;number&amp;gt;
  4. &amp;lt;number&amp;gt;
  5. &amp;lt;number&amp;gt;
  6. &amp;lt;number&amp;gt;
  7. &amp;lt;number&amp;gt;
  8. &amp;lt;number&amp;gt;
  9. &amp;lt;number&amp;gt;
  10. &amp;lt;number&amp;gt;