3. Primary key => distribution key
hub -< satellite join
- data redistribution
- join local in parallel
BK SID
Ensemble 1
Dimensional 2
SID LDTS INFO
1 2001-01-01 My first DV
1 2014-06-05 DV Masters
2 1997-08-02 DM manifesto
Node 1
Node 2
4. Hub SID => distribution key
hub -< satellite join
- join local in parallel
BK SID
Ensemble 1
Dimensional 2
SID LDTS INFO
1 2001-01-01 First DV
1 2014-06-05 DV Masters
2 1997-08-02 DM manifesto
Node 1
Node 2
5. Link SID => distribution key
Default L_SID, 1:N & N:M
- data redistribution
- join local in parallel
H_MID H_SID L_SID
1 A 1
1 B 2
L_SID LDTS LDTS_END CURRENT
1 2001-01-01 2006-01-01 N
1 2014-06-05 9999-12-31 Y
2 2006-01-01 2014-06-05 N
H_MID H_SID L_SID
1 A 1
1 B 2
L_SID H_MID H_SID LDTS LDTS_END
1 1 A 2001-01-01 2006-01-01
1 1 B 2014-06-05 9999-12-31
2 1 A 2006-01-01 2014-06-05
1:N => H_MID on link satellite
- join local in parallel
H_MID is the ensemble identifier !
Node 1
Node 2
6. Use the ensemble identifier if possible!
H_SID H_SID LDTS INFO
L_SID? H_SID H_MID H_SID ? L_SID ? LDTS INFO
Distributing data efficiently to ensure good
performance in a MPP database.
- If uneven distribution, one node may become a
bottleneck for the whole execution
Try to minimize data movement between nodes
- Data redistribution may occur when joining tables
Ensemble