Distributed Database Outline

Pachaiyappa's College For Men., Kanchipuram
Department of Computer Science
M.sc., Computer Science & Technology
Distributed DataBase

Outline

Introduction
« What is a distributed DBMS
« Problems
« Current state-of-affairs
Distributed Database Design
« Fragmentation
« Data Location
Distributed Query Processing
« Query Processing Methodology
« Distributed Query Optimization
Distributed Transaction Management
« Transaction Concepts and Models
« Distributed Concurrency Control
« Distributed Reliability

What is distributed ...
Processing logic
Functions
Data
Control

What is a Distributed Database System?
A distributed database (DDB) is a collection of multiple, logically interrelated databases
distributed over a computer network.
A distributed database management system (D–DBMS) is the software that manages the
DDB and provides an access mechanism that makes this distribution transparent to the users.
Distributed database system (DDBS) = DDB + D–DBMS .

What is not a DDBS?

A timesharing computer system
A loosely or tightly coupled multiprocessor system
A database system which resides at one of the nodes of a network of computers - this is a
centralized database on a network node .

Applications

Manufacturing - especially multi-plant manufacturing
Military command and control
Airlines
Hotel chains
Any organization which has a decentralized organization structure

Distributed DBMS Promises

Transparent management of distributed, fragmented, and replicated data
Improved reliability/availability through distributed transactions
Improved performance
Easier and more economical system expansion

Transparency

Transparency is the separation of the higher level semantics of a system from the lower
level implementation issues.
Fundamental issue is to provide data independence in the distributed environment
« Network (distribution) transparency
« Replication transparency
« Fragmentation transparency
horizontal fragmentation: selection
vertical fragmentation: projection
hybrid

Potentially Improved Performance

Proximity of data to its points of use
« Requires some support for fragmentation and replication Parallelism in execution
« Inter-query parallelism
« Intra-query parallelism

Distributed DBMS Issues

Distributed Database Design
« how to distribute the database
« replicated & non-replicated database distribution
« a related problem in directory management

Query Processing
« convert user transactions to data manipulation
instructions
« optimization problem
« min{cost = data transmission + local processing}
« general formulation is NP-hard

Concurrency Control
« synchronization of concurrent accesses
« consistency and isolation of transactions' effects
« deadlock management

Reliability
« how to make the system resilient to failures
« atomicity and durability

Operating System Support
« operating system with proper support for database operations
« dichotomy between general purpose processing requirements and database processing
requirements

Open Systems and Interoperability
« Distributed Multidatabase Systems
« More probable scenario
« Parallel issues

Design Problem

In the general setting :
Making decisions about the placement of data and programs across the sites of a computer
network as well as possibly designing the network itself.

In Distributed DBMS, the placement of applications entails
« placement of the distributed DBMS software; and
« placement of the applications that run on the database

Distribution Design

Top-down
« mostly in designing systems from scratch
« mostly in homogeneous systems

Bottom-up
« when the databases already exist at a number of sites

Distribution Design Issues

Why fragment at all?
How to fragment?
How much to fragment?
How to test correctness?
How to allocate?
Information requirements?

Fragmentation

Can't we just distribute relations?
What is a reasonable unit of distribution?
« relation
views are subsets of relations ê locality
extra communication
« fragments of relations (sub-relations)
concurrent execution of a number of transactions that access different portions of a relation .
views that cannot be defined on a single fragment will require extra processing .
semantic data control (especially integrity enforcement) more difficult .

Correctness of Fragmentation

Completeness
« Decomposition of relation R into fragments R1, R2, ..., Rn is complete if and only if each data
item in R can also be found in some Ri .

Reconstruction
« If relation R is decomposed into fragments R1, R2, ..., Rn, then there should exist some
relational operator ∇such that R = ∇1≤i≤nRi .

Disjointness
« If relation R is decomposed into fragments R1, R2, ..., Rn, and data item di is in Rj, then di
should not be in any other fragment Rk (k ≠ j ).

Allocation Alternatives

Non-replicated
« partitioned : each fragment resides at only one site
Replicated
« fully replicated : each fragment at each site
« partially replicated : each fragment at some of the sites

Information Requirements

Four categories:
Database information
Application information
Communication network information
Computer system information

Fragmentation

Horizontal Fragmentation (HF)
« Primary Horizontal Fragmentation (PHF)
« Derived Horizontal Fragmentation (DHF)
Vertical Fragmentation (VF)
Hybrid Fragmentation (HF)

Primary Horizontal Fragmentation Definition :

Rj = σFj (R ), 1 ≤ j ≤ w
where Fj is a selection formula, which is (preferably) a minterm predicate. Therefore, A
horizontal fragment Ri of relation R consists of all the tuples of R which satisfy a minterm predicate
mi.
⇓
Given a set of minterm predicates M, there are as many horizontal fragments of relation R as
there are minterm predicates. Set of horizontal fragments also referred to as minterm fragments.

PHF – Algorithm

Given: A relation R, the set of simple predicates Pr
Output: The set of fragments of R = {R1, R2,...,Rw}
which obey the fragmentation rules.
Preliminaries :
« Pr should be complete
« Pr should be minimal

PHF – Example

Two candidate relations : PAY and PROJ.
Fragmentation of relation PAY
« Application: Check the salary info and determine raise.
« Employee records kept at two sites ⇒ application run at
two sites
« Simple predicates
p1 : SAL ≤ 30000
p2 : SAL > 30000
Pr = {p1,p2} which is complete and minimal Pr'=Pr
« Minterm predicates
m1 : (SAL ≤ 30000)
m2 : NOT(SAL ≤ 30000) = (SAL > 30000)

Derived Horizontal Fragmentation

Defined on a member relation of a link according to a selection operation specified on
its owner.
« Each link is an equijoin.
« Equijoin can be implemented by means of semijoins.

DHF – Definition

Given a link L where owner(L)=S and member(L)=R, the derived horizontal fragments of R are
defined as
Ri = R F Si, 1≤i≤w
where w is the maximum number of fragments that will be defined on R and
Si = σFi (S)
where Fi is the formula according to which the primary horizontal fragment Si is defined.

DHF – Example

Given link L1 where owner(L1)=SKILL and
member(L1)=EMP
EMP1 = EMP SKILL1
EMP2 = EMP SKILL2
where
SKILL1 = σ SAL≤30000 (SKILL)
SKILL2 = σSAL>30000 (SKILL)

Vertical Fragmentation

Has been studied within the centralized context
« design methodology
« physical clustering
More difficult than horizontal, because more alternatives exist.
Two approaches :
« grouping
attributes to fragments
« splitting
relation to fragments

Overlapping fragments

« grouping
Non-overlapping fragments
« splitting
We do not consider the replicated key attributes to be overlapping.
Advantage:
Easier to enforce functional dependencies (for integrity checking etc.)

Query Processing Components

Query language that is used
➠ SQL: “intergalactic dataspeak”

Query execution methodology
➠ The steps that one goes through in executing high- level (declarative) user queries.

Query optimization
➠ How do we determine the “best” execution plan?

Selecting Alternatives

SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND DUR > 37

Strategy 1
ΠENAME(σDUR>37∧EMP.ENO=ASG.ENO (EMP × ASG))

Strategy 2
ΠENAME(EMP (σDUR>37 (ASG)))
ENO

Strategy 2 avoids Cartesian product, so is “better”

Cost of Alternatives

Assume:
➠ size(EMP) = 400, size(ASG) = 1000
➠ tuple access cost = 1 unit; tuple transfer cost = 10 units

Strategy 1
produce ASG': (10+10)∗tuple access cost 20
transfer ASG' to the sites of EMP: (10+10)∗tuple transfer cost 200
produce EMP': (10+10) ∗tuple access cost∗2 40
transfer EMP' to result site: (10+10) ∗tuple transfer cost 200
Total cost 460

Strategy 2
transfer EMP to site 5:400∗tuple transfer cost 4,000
transfer ASG to site 5 :1000∗tuple transfer cost 10,000
produce ASG':1000∗tuple access cost 1,000
join EMP and ASG':400∗20∗tuple access cost 8,000
Total cost 23,000

Query Optimization Objectives

Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed environments
Wide area networks
➠ communication cost will dominate
low bandwidth
low speed
high protocol overhead
➠ most algorithms ignore all other cost components
Local area networks
➠ communication cost not that dominant
➠ total cost function should be considered
Can also maximize throughput

Query Optimization Issues – Types of Optimizers

Exhaustive search
➠ cost-based
➠ optimal
➠ combinatorial complexity in the number of relations

Heuristics
➠ not optimal
➠ regroup common sub-expressions
➠ perform selection, projection first
➠ replace a join by a series of semijoins
➠ reorder operations to reduce intermediate relation size
➠ optimize individual operations

Query Optimization Issues – Optimization Granularity

Single query at a time
➠ cannot use common intermediate results
Multiple queries at a time
➠ efficient if many similar queries
➠ decision space is much larger

Query Optimization Issues – Optimization Timing

Static
➠ compilation ⇒ optimize prior to the execution
➠ difficult to estimate the size of the intermediate results
⇒ error propagation
➠ can amortize over many executions
➠ R*

Dynamic
➠ run time optimization
➠ exact information on the intermediate relation sizes
➠ have to reoptimize for multiple executions
➠ Distributed INGRES

Hybrid
➠ compile using a static algorithm
➠ if the error in estimate sizes > threshold, reoptimize at
run time
➠ MERMAID

Query Optimization Issues – Statistics

Relation
➠ cardinality
➠ size of a tuple
➠ fraction of tuples participating in a join with another relation

Attribute
➠ cardinality of domain
➠ actual number of distinct values

Common assumptions
➠ independence between different attribute values
➠ uniform distribution of attribute values within their domain

Query Optimization Issues – Decision Sites

Centralized
➠ single site determines the “best” schedule
➠ simple
➠ need knowledge about the entire distributed database

Distributed
➠ cooperation among sites to determine the schedule
➠ need only local information
➠ cost of cooperation

Hybrid
➠ one site determines the global schedule
➠ each site optimizes the local subqueries

Query Optimization Issues – Network Topology

Wide area networks (WAN) – point-to-point
➠ characteristics
low bandwidth
low speed
high protocol overhead
➠ communication cost will dominate; ignore all other cost factors
➠ global schedule to minimize communication cost
➠ local schedules according to centralized query optimization

Local area networks (LAN)
➠ communication cost not that dominant
➠ total cost function should be considered
➠ broadcasting can be exploited (joins)
➠ special algorithms exist for star networks

Cost-Based Optimization

Solution space
➠ The set of equivalent algebra expressions (query trees).

Cost function (in terms of time)
➠ I/O cost + CPU cost + communication cost
➠ These might have different weights in different distributed environments (LAN vs WAN).
➠ Can also maximize throughput

Search algorithm
➠ How do we move inside the solution space?
➠ Exhaustive search, heuristic algorithms (iterative improvement, simulated annealing,
genetic,...)

Join Ordering in Fragment Queries

Ordering joins
➠ Distributed INGRES
➠ System R*
Semijoin ordering
➠ SDD-1

Join Ordering

Consider two relations only

if size (R) < size (S)
R S
if size (R) > size (S)

Multiple relations more difficult because too many alternatives.
➠ Compute the cost of all alternatives and select the best one.
Necessary to compute the size of intermediate
relations which is difficult.
➠ Use heuristics

Join Ordering – Example

Execution alternatives:
1. EMP → Site 2 2. ASG → Site 1
Site 2 computes EMP'=EMP ASG Site 1 computes EMP'=EMP ASG
EMP' → Site 3 EMP' → Site 3
Site 3 computes EMP’ PROJ Site 3 computes EMP’ PROJ

3. ASG → Site 3 4. PROJ → Site 2
Site 3 computes ASG'=ASG PROJ Site 2 computes PROJ'=PROJ ASG
ASG' → Site 1 PROJ' → Site 1
Site 1 computes ASG' EMP Site 1 computes PROJ' EMP

5. EMP → Site 2
PROJ → Site 2
Site 2 computes EMP PROJ ASG

Semijoin Algorithms

Consider the join of two relations:
➠ R[A] (located at site 1)
➠ S[A] (located at site 2)

Alternatives:
1 Do the join R S
A

2 Perform one of the semijoin equivalents
3 Perform the join

SDD-1 Algorithm
Based on the Hill Climbing Algorithm
➠ Semijoins
➠ No replication
➠ No fragmentation
➠ Cost of transferring the result to the user site from the final result site is not considered
➠ Can minimize either total time or response time

Hill Climbing Algorithm

Assume join is between three relations.
Step 1: Do initial processing
Step 2: Select initial feasible solution (ES0)
2.1 Determine the candidate result sites – sites where a relation referenced in the query exist
2.2 Compute the cost of transferring all the other referenced relations to each candidate site
2.3 ES0 = candidate site with minimum cost
Step 3: Determine candidate splits of ES0 into {ES1, ES2}
3.1 ES1 consists of sending one of the relations to the other relation's site
3.2 ES2 consists of sending the join of the relations to the final result site
Step 4: Replace ES0 with the split schedule which gives
cost(ES1) + cost(local join) + cost(ES2) < cost(ES0)
Step 5: Recursively apply steps 3–4 on ES1 and ES2 until no such plans can be found
Step 6: Check for redundant transmissions in the final plan and eliminate them.

SDD-1 Algorithm

Initialization
Step 1: In the execution strategy (call it ES),
include all the local processing
Step 2: Reflect the effects of local processing on
the database profile
Step 3: Construct a set of beneficial semijoin
operations (BS) as follows :
BS = Ø
For each semijoin SJi
BS ← BS ∪ SJi if cost(SJi ) < benefit(SJi)

SDD-1 Algorithm – Example
Consider the following query
SELECT R3.C
FROM R1, R2, R3
WHERE R1.A = R2.A
AND R2.B = R3.B
Iterative Process
Step 4: Remove the most beneficial SJi from BS and append it to ES
Step 5: Modify the database profile accordingly
Step 6: Modify BS appropriately
➠ compute new benefit/cost values
➠ check if any new semijoin need to be included in BS
Step 7: If BS ≠ Ø, go back to Step 4.
Assembly Site Selection
Step 8: Find the site where the largest amount of data resides and select it as the assembly site

Example:
Amount of data stored at sites:
Site 1: 360
Site 2: 360
Site 3: 2000
Therefore, Site 3 will be chosen as the assembly site.

Postprocessing
Step 9: For each Ri at the assembly site, find the semijoins of the type
Ri Rj
where the total cost of ES without this semijoin is smaller than the cost with it and remove the
semijoin from ES.
Note : There might be indirect benefits.
➠ Example: No semijoins are removed.
Step 10: Permute the order of semijoins if doing so would improve the total cost of ES.
➠ Example: Final strategy:
Send (R2 R1) R3 to Site 3
Send R1 R2 to Site 3

Distributed Query Optimization Problems

Cost model
➠ multiple query optimization
➠ heuristics to cut down on alternatives

Larger set of queries
➠ optimization only on select-project-join queries
➠ also need to handle complex queries (e.g., unions, disjunctions, aggregations and sorting)

Optimization cost vs execution cost tradeoff
➠ heuristics to cut down on alternatives
➠ controllable search strategies

Optimization/reoptimization interval
➠ extent of changes in database profile before reoptimization is necessary

Transaction

A transaction is a collection of actions that make consistent transformations of system states while
preserving system consistency.
« concurrency transparency « failure transparency

Transaction Example – A Simple SQL Query

Transaction BUDGET_UPDATE
begin
EXEC SQL UPDATE PROJ
SET BUDGET = BUDGET∗1.1
WHERE PNAME = “CAD/CAM”
end.

Example Database
Consider an airline reservation example with the relations:

FLIGHT(FNO, DATE, SRC, DEST, STSOLD, CAP)
CUST(CNAME, ADDR, BAL)
FC(FNO, DATE, CNAME,SPECIAL)

Example Transaction – SQL Version

Begin_transaction Reservation
begin
input(flight_no, date, customer_name);
EXEC SQL UPDATE FLIGHT
SET STSOLD = STSOLD + 1
WHERE FNO = flight_no AND DATE = date;
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES (flight_no, date, customer_name, null);
output(“reservation completed”)
end . {Reservation}

Termination of Transactions

begin
EXEC SQL SELECT STSOLD,CAP
INTO temp1,temp2
FROM FLIGHT
if temp1 = temp2 then
output(“no free seats”);
Abort
else
EXEC SQL UPDATEFLIGHT
SET STSOLD = STSOLD + 1
EXEC SQL INSERT
INTO FC(FNO, DATE, CNAME, SPECIAL);
VALUES(flight_no, date, customer_name, null);
Commit
endif
end . {Reservation}

Example Transaction – Reads & Writes

begin
temp ← Read(flight_no(date).stsold);
if temp = flight(date).cap then
begin
output(“no free seats”);
Abort
end
else begin
Write(flight(date).stsold, temp + 1);

Write(flight(date).cname, customer_name);
Write(flight(date).special, null);
Commit;
end
end. {Reservation}

Properties of Transactions

ATOMICITY
« all or nothing
CONSISTENCY
« no violation of integrity constraints
ISOLATION
« concurrent changes invisible È serializable
DURABILITY
« committed updates persist

Atomicity

Either all or none of the transaction's operations are performed.
Atomicity requires that if a transaction is interrupted by a failure, its partial results must be
undone.
The activity of preserving the transaction's atomicity in presence of transaction aborts due to
input errors,
system overloads, or deadlocks is called transaction recovery.
The activity of ensuring atomicity in the presence of system crashes is called crash recovery.

Consistency

Internal consistency
« A transaction which executes alone against a consistent database leaves it in a consistent state.
« Transactions do not violate database integrity constraints.
Transactions are correct programs

Consistency Degrees

Degree 0
« Transaction T does not overwrite dirty data of other transactions
« Dirty data refers to data values that have been updated by a transaction prior to its
commitment

Degree 1
« T does not overwrite dirty data of other transactions
« T does not commit any writes before EOT

Degree 2
« T does not read dirty data from other transactions

Degree 3
« T does not read dirty data from other transactions
« Other transactions do not dirty any data read by T before T completes.

Isolation

Serializability
« If several transactions are executed concurrently, the results must be the same as if they
were executed serially in some order.
Incomplete results
« An incomplete transaction cannot reveal its results to other transactions before its
commitment.
« Necessary to avoid cascading aborts.

Durability

Once a transaction commits, the system must guarantee that the results of its operations will
never be lost, in spite of subsequent failures.
Database recovery

Characterization of Transactions

Based on
« Application areas
non-distributed vs. distributed
compensating transactions
heterogeneous transactions

« Timing
on-line (short-life) vs batch (long-life)

« Organization of read and write actions
two-step
restricted
action model

« Structure
flat (or simple) transactions
nested transactions
workflows

Transaction Structure

Flat transaction
« Consists of a sequence of primitive operations embraced between a begin and end markers.

...
end.

Nested transaction

« The operations of a transaction may themselves be transactions.

...
Begin_transaction Airline
– ...
end. {Airline}
Begin_transaction Hotel
...
end. {Hotel}
end. {Reservation}

Transaction Processing Issues

Transaction structure (usually called transaction model)
« Flat (simple), nested

Internal database consistency
« Semantic data control (integrity enforcement) algorithms

Reliability protocols
« Atomicity & Durability
« Local recovery protocols
« Global commit protocols

Concurrency control algorithms

« How to synchronize concurrent transaction executions (correctness criterion)
« Intra-transaction consistency, Isolation

Replica control protocols

« How to control the mutual consistency of replicated data
« One copy equivalence and ROWA

Concurrency Control

The problem of synchronizing concurrent transactions such that the consistency of the database is
maintained while, at the same time, maximum degree of concurrency is achieved.

Anomalies:
« Lost updates
The effects of some transactions are not reflected on the database.
« Inconsistent retrievals
A transaction, if it reads the same data item more than once, should always read the same
value.

Serializability in Distributed DBMS

Somewhat more involved. Two histories have to be considered:
« local histories « global history
For global transactions (i.e., global history) to be serializable, two conditions are necessary:
« Each local history should be serializable.
« Two conflicting operations should be in the same relative order in all of the local histories
where they appear together.

Concurrency Control Algorithms

Pessimistic
« Two-Phase Locking-based (2PL)
Centralized (primary site) 2PL
Primary copy 2PL
Distributed 2PL

« Timestamp Ordering (TO)
Basic TO
Multiversion TO
Conservative TO

« Hybrid

Optimistic
« Locking-based
« Timestamp ordering-based

Two-Phase Locking (2PL)

A Transaction locks an object before using it.
When an object is locked by another transaction, the requesting transaction must wait.
̧ When a transaction releases a lock, it may not request another lock.

Centralized 2PL

There is only one 2PL scheduler in the distributed system.
Lock requests are issued to the central scheduler.

Distributed 2PL

2PL schedulers are placed at each site. Each scheduler handles lock requests for data at that site.
A transaction may read any of the replicated copies of item x, by obtaining a read lock on one of
the copies of x. Writing into x requires obtaining write locks for all copies of x.

Timestamp Ordering

Transaction (Ti) is assigned a globally unique timestamp ts(Ti).
Transaction manager attaches the timestamp to all operations issued by the transaction.
Each data item is assigned a write timestamp (wts) and a read timestamp (rts):
« rts(x) = largest timestamp of any read on x

« wts(x) = largest timestamp of any read on x
Conflicting operations are resolved by timestamp order.
Basic T/O:
for Ri(x) for Wi(x)
if ts(Ti) < wts(x) if ts(Ti) < rts(x) and ts(Ti) < wts(x)
then reject Ri(x) then reject Wi(x)
else accept Ri(x) else accept Wi(x)
rts(x) ← ts(Ti) wts(x) ← ts(Ti)

Deadlock

A transaction is deadlocked if it is blocked and will remain blocked until there is intervention.
Locking-based CC algorithms may cause deadlocks.
TO-based algorithms that involve waiting may cause deadlocks.
Wait-for graph
« If transaction Ti waits for another transaction Tj to release
a lock on an entity, then Ti → Tj in WFG.

Deadlock Management

Ignore
« Let the application programmer deal with it, or restart the system
Prevention
« Guaranteeing that deadlocks can never occur in the first place. Check transaction when it is
initiated. Requires no run time support.
Avoidance
« Detecting potential deadlocks in advance and taking action to insure that deadlock will not
occur. Requires run time support.
Detection and Recovery
« Allowing deadlocks to form and then finding and breaking them. As in the avoidance scheme,
this requires run time support.

Deadlock Prevention

All resources which may be needed by a transaction must be predeclared.
« The system must guarantee that none of the resources will be needed by an ongoing transaction.
« Resources must only be reserved, but not necessarily allocated a priori
« Unsuitability of the scheme in database environment
« Suitable for systems that have no provisions for undoing processes.
Evaluation:
– Reduced concurrency due to preallocation
– Evaluating whether an allocation is safe leads to added overhead.
– Difficult to determine (partial order)
• No transaction rollback or restart is involved.
Deadlock Avoidance

Transactions are not required to request resources a priori.
Transactions are allowed to proceed unless a requested resource is unavailable.
In case of conflict, transactions may be allowed to wait for a fixed time interval.
Order either the data items or the sites and always request locks in that order.
More attractive than prevention in a database environment.

Distributed Deadlock Detection

Sites cooperate in detection of deadlocks.

One example:
« The local WFGs are formed at each site and passed on to other sites. Each local WFG is
modified as follows:
Since each site receives the potential deadlock cycles from other sites, these edges are added
to the local WFGs
· The edges in the local WFG which show that local transactions are waiting for transactions at
other sites are joined with edges in the local WFGs which show that remote transactions are waiting
for local ones.

« Each local deadlock detector:
looks for a cycle that does not involve the external edge. If it exists, there is a local deadlock
which can be handled locally.
looks for a cycle involving the external edge. If it exists, it indicates a potential global
deadlock. Pass on the information to the next site.

Reliability

Problem:
How to maintain
atomicity
durability
properties of transactions

Fundamental Definitions

Reliability

« A measure of success with which a system conforms to some authoritative specification of its
behavior.
« Probability that the system has not experienced any failures within a given time period.
« Typically used to describe systems that cannot be repaired or where the continuous operation
of the system is critical.

Availability
« The fraction of the time that a system meets its specification.
« The probability that the system is operational at a given time t.

All The Best...

Distributed Database Outline

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Distributed Database Outline

Similaire à Distributed Database Outline (20)

Dernier

Dernier (20)

Distributed Database Outline