The Art and Science of DDS Data Modelling

The Art and Science of DDS
Data Modelling
Angelo Corsaro, PhD
Chief Technology Oﬃcer
OMG DDS SIG Co-Chair
angelo.corsaro@prismtech.com

PrismTech

A Recurring Question
• People new to DDS recurrently ask a question: what are the techniques and
• My answer is usually: Start with the powerful tools and techniques provided
by relational data modelling and then add some DDS-specific spice

• I’ve come to the conclusion that many people are not very familiar with
relational data modelling, or perhaps it is way too long that they have
studied/reviewed these concepts

• This webcast, will provide a relatively well introduction to the relational
data model

PrismTech

Copyright PrismTech, 2014

patterns that we can use to design DDS-based Systems?

• Introduced by Edward Codd in 1970 as a way of representing data models
for Data Bases

• Simple and Elegant: A database becomes a collections of one or more
relations where each relation is a table with rows and columns

PrismTech


Relational Model

Relation
it consists of two dimensional table

• The columns of a relation are called attributes
• The name of the relation along with the set of attributes defines the relation
schema

• The rows of the relation, other than the header containing the attribute
names, are called tuples

PrismTech


• The relation is the construct used representing data in the relational model,

Relation’s Schema

-

relation’s name

-

the name of each field/attribute, e.g. column

-

the domain of each field, e.g. the type of the field
!

• Example:
-‐

PrismTech

Student(sid:
string,
name:
string,
age:
integer,
gpa:
real)


• The relation schema specifies:

Tuples
• An instance of a relation is a set of tuples (records) in which each tuple has the same
number of fields as in the relation schema.

rows have the same number of fields (columns)
!

sid
!
!
!

name

age

gpa

1234

Peter Parker

21

4.0

2345
3456

Tony Stark
Bruce Wayne

15
23

4.0
3.5

• Notice that rows are all different. This is a requirement of the relational model, as a
relation instance is a collection of unique tuples (or rows)

PrismTech


• A relation’s instance can be visualised as table where each tuple is a row and all

• The cardinality of a relation R is defined as the number of tuples belonging
to the relation

• The degree, or arity, of a relation R is defined as the number of its fields

PrismTech


Cardinality and Degree

Keys

• A superkey is a set of attributes that includes the primary key
• Example:
-

The sid field is the key for the Students relations

sid
1234
2345
3456
PrismTech

name
Peter Parker
Tony Stark
Bruce Wayne

age
21
15
23

gpa
4.0
4.0
3.5


• The key of a relation is a set of fields that uniquely identifies a tuple

• A foreign key allows to introduce a link between two relations
• For instance, the sid in the Courses relation is a foreign key allow to refer as
well as introduce an integrity constraint to the students relations
Courses
cid
sid grade
Physics303 1234 A+
Robotics323 2345 A+
Calculus343 2345 A

PrismTech

Students
sid
1234
2345
3456

name
Peter Parker
Tony Stark
Bruce Wayne

age
21
15
23

gpa
4.0
4.0
3.5


Foreign Keys

• DDS provides a Global Data Space
abstraction that allow applications
to autonomously, anonymously,
securely and efficiently share data

• DDS’ Global Data Space is fully
distributed, highly efficient and
scalable

PrismTech


Data Distribution Service (DDS)

• DataWriters and DataReaders are
automatically and dynamically
matched by the DDS Discovery

• A rich set of QoS allows to control
existential, temporal, and spatial
properties of data

PrismTech


Data Distribution Service (DDS)

Topic


• A Topic defines a domain-wide information’s class
• A Topic is defined by means of a (name, type,
qos) tuple, where
-

type: is the programming language type associated
with the topic. Types are extensible and evolvable

Name

-

qos: is a collection of policies that express the nonfunctional properties of this topic, e.g. reliability,
persistence, etc.

Topic

e

PrismTech

Qo
S

name: identifies the topic within the domain

Typ

-

Topic and Instances
• As explained in the previous slide a topic defines a class/type of information

• Topic Instances are identified by means of the topic key
• A Topic Key is identified by a tuple of attributes -- like in databases
• Remarks:
-

PrismTech

A Singleton topic has a single domain-wide instance
A “regular” Topic can have as many instances as the number of different key
values, e.g., if the key is an 8-bit character then the topic can have 256 different
instances


• Topics can be defined as Singleton or can have multiple Instances

• IDL is the most commonly used syntax

Topic

• Example:
struct
Student
{

long

sid;

string

name;

int

age;

float

gpa;

};

#pragma
keylist
Student
sid

PrismTech

Qo
S

Name
e
Typ

• A Topic type can be defined in different syntaxes


Topic Example

Topics as Relations

struct
Student
{

long

sid;

string

name;

int

age;

float

gpa;

};

#pragma
keylist
Student
sid

Student(sid, name, age, gpa)
name

age

gpa

1234

Peter Parker

21

4.0

2345
3456

PrismTech

sid

Tony Stark
Bruce Wayne

15
23

4.0
3.5


• A Topic cans be seen as defining a relation

• Topics Types => Relation Schema
• Topic Instance => Key
• Topic Sample => Tuple

PrismTech


Mapping DDS to the Relational Model

• Start identifying corse relations and properties of data
• Start decomposing based on properties
• Apply a normal form
-

PrismTech

Functional Dependencies => Boyce-Codd Normal Form
Multivalued Dependencies => Fourth Normal Form


Relational Design

UML Data Modelling
• A subset of UML can be used to model Data Models
• The resulting model can be easily translated into a relational model and the used
• The allowed subset of UML are:
-

Classes (with only attributes)

-

Associations

-

Association Classes

-

Subclasses

-

Composition and Aggregation

• UML Data Models can be automatically translated into relational model as far as
each “regular” class defines a primary key

PrismTech


in a DBMS or DDS

Class
• A UML class is mapped to a relation that has the same name of the class,

Student
sid: int
name: string
age: int
gpa: ﬂoat

PrismTech


shares its key and attributes


• By default association can be mapped as follows, yet, depending on the

multiplicity of the association different mappings may be possible/desirable
!

!

C1
K1: PK
O1

A

C2
K2: PK
O2

C1(K1, O1)
C2(K2, O2)

A(K1,K2)

!

• The key definition in the association depends on the multiplicity
PrismTech


Association

1-to-many Association
M1 Use a relation to capture the association
M2 Embed the association on the many side of the association

C1
K1: PK
O1

PrismTech

0..1

A

C2
* K2: PK
O2

M1 C1(K1, O1), C2(K2, O2), A(K1, K2)
M2 C1(K1, O1), C2(K2, O2, K1)


There are two ways of mapping a 1-to-many association to the relational
model

C1
K1: PK
O1

PrismTech

*

A

C2
* K2: PK
O2

C1(K1, O1)
C2(K2, O2)

A(K1,K2)


many-to-many Associations

Relationships arity
K2

K2

K1

K2


K1

K1

PrismTech

One to Many

Many to Many

Key = K2

One to One

Key = K1, K2

C1
K1: PK
O1

A

C2
K2: PK
O2

C1(K1, O1)
C2(K2, O2)

Association
A

PrismTech

A(K1,K2, a1, a2)


Association Classes

tsdotd14

Self Association
• Self association are modelled as traditional relations, which the only

Student
sid: int
name: string
age: int
gpa: ﬂoat
*

PrismTech

Sibling(sidParent, sidSibling)

*
Slbling


difference that attributes mau be conserved

Subclasses
Three ways of mapping subclassing to the relational model

T2 Subclass relations contain all attributes
T3 One relation containing all superclass and subclass attributes
A
K: PK
X

T1 A(K, X), B(K, Y), C(K, Z)
T2 A(K, X), B(K, X, Y), C(K, X, Z)

B
Y

C
Z

T3 A(K, X, Y, Z)

The best translation may depend on the the context, e.g. T3 good for heavily
overlapping subclasses, T2 good for disjoint and complete subclasses
PrismTech


T1 Subclass relations contain the superclass key and the specialised attributes

Composition and Aggregation
• The precondition to easily map composition to the relational model is for

Whole
K: PK
W

Part
P

Whole(K, W)

Part(P, K)

• When mapping aggregation (unfilled diamond), the key K on the Part
should have a domain that allows for null values

PrismTech


the part not to have a key

• A subset of UML can be used to model relational data models
• The mapping rules can be used to help translating existing Object Oriented
data models into their relational counter-part

PrismTech


Summing Up

Why Relation Reﬁnement?
• The UML/ER Data Models provide usually a good starting point toward the
• The relations implied by the UML/ER Data Model often need to be

normalised and re-organised to address performances and workload criteri

• The goal of relation refinements is to remove redundancy and/or
decompose a relation with smaller relations

• Normal forms provide a way of measuring the amount of redundancy that
may be in our data model

PrismTech


data model that we’ll actually use in the system

Redundancy
• Redundant Storage: Information may be stored multiple times leading to
• Update Anomalies: If one copy of the redundant information is update this

may create inconsistencies in other copies — unless all copies are updated
at the same time

• Insertion Anomalies: It may not be possible to store some information,
unless some other information is stored as well

• Deletion Anomalies: It may not be possible to delete some information
without loosing som other information as well

PrismTech


space, and perhaps time, inefficiencies

Decomposition
• Unconsidered decomposition can lead more problems than benefits, thus
-

You really need to decompose the relation

-

You fully understand the implications of the decomposition (lossless join,
dependency preservation)

• Normal Forms provide good guidelines for relations decompositions as they
guarantees that certain class of problems cannot be introduced

• Notice that decomposition can have a performance impact as it may
lead to an increase in joins

PrismTech


when decomposing you always want to ensure that:

Functional Dependencies
• A Functional Dependency (FD) is a kind of Integrity Constraint (IC) that
• Given a relation R along with two nonempty sets of attributes X and Y in R,
we say that R satisfies the FD X ⟶ Y if the following holds for every pair of
tuples t1 and t2 in R:
!

if t1.X = t2.X then t1.Y = t2.Y

• In other terms, the FD says that if two tuple agree on the set of attributes on
X they also agree on the set of attributes in Y

• Notice that a primary key constraint is a special kind of FD
PrismTech


generalises the concept of a key

Example
percentile of the student GPA, e.g. which percentage of students has a GPA that is
smaller of equal
!

sid
!
!

name

age

gpa

percentile

1234

Peter Parker

21

4.0

100

2345
Tony Stark
3456 Bruce Wayne

15
23

4.0
3.5

100
75

!

• Clearly we have that the percentile attribute functionally depends on gpa, or
equivalently gpa ⟶ percentile

PrismTech


• Let’s assume our Student relation now includes a new attribute that measure the

Normal Forms
• Different Normal Forms (NF) exist that provide guidance on how to decompose
• If a relation is in a given normal form then we are guarantees that some
anomalies cannot arise, e.g. update anomaly, etc.

• The normal forms based on functional dependencies are the first normal form

(1FN), second normal form (2FN), third normal form (3NF) and the Boyce-Codd
normal form (BCNF)

• Every relation in BCNF is also in 3NF, every relation in 3FN is also in 2FN and finally
every relation in 2NF is also in 1NF

• The 2NF and 3NF have only historical interest, while the BCNF has important
practical applicability

PrismTech


relations

• A relation is in 1NF if every field contains only atomic values, that is not lists,
or sets

PrismTech


1NF

Boyce-Codd Normal Form (BCNF)
Let R be a relation, X a subset of attributes of R and a an attribute of R. R is in Boyce-Codd
Normal Form (BCNF) if for every FD: X ⟶ {a} that holds over R, one of the following is true:


• a ∊ X, that is it is a trivial FD, or
• X is a superkey
!

Intuitively, in a BCNF relation the only nontrivial dependencies are those in which a key
determines some attributes. Each attribute must describe the key, the whole key, and
nothing but the key
key

attr 1

attr 2

Functional Dependencies in BCNF

PrismTech

attr k

BCNF Decomposition Algorithm
Input: relation R and FDs for R

Compute Keys for R
Repeat until all relations are in BCNF
Choose a relation Ri with A ⟶ B that violates BCNS
Decompose Ri into R1(A, B) and R2(A, rest)
Compute FDs for R1 and R2
Compute Keys for R1 and R2

PrismTech


Output: decomposition of R into BCNF relations with lossless join

3NF

• a ∊ X, that is it is a trivial FD, or
• X is a superkey, or
• a is part of some key for R
The definition of 3NF is similar to that of BCNF, with the difference that a may
be part of a key for R

PrismTech


Let R be a relation schema, X a subset of attributes of R and a an attribute of
R. R is in Third Normal Form if for every FD: X ⟶ {a} that holds over R, one of
the following is true:

Multivalued Dependencies
• For a relation R we say that A ↠ B (A multi-determines B), where A and B
!

!

!

∀ t,u ∈ R: t.A = u.A then ∃ v ∈ R:
v.A = t.A and
v.B = t.B and
v.rest = u.rest

!

• Multivalued dependencies are sometimes called tuple-generating
dependencies

PrismTech


are sets of fields in R, if:

• A relation R with multivalued dependencies (MVD) is in 4NF if for each nontrivial A ↠ B, A is a key

• The 4NF implies the BCNF

PrismTech


Fourth Normal Form (4NF)

4NF Decomposition Algorithm
Output: decomposition of R into 4NF relations with lossless join
Compute Keys for R
Repeat until all relations are in 4NF
Choose a relation Ri with a nontrivial A ↠ B that violates 4NF
Decompose Ri into R1(A, B) and R2(A, rest)
Compute FDs and MVDs for R1 and R2
Compute Keys for R1 and R2

PrismTech


Input: relation R and FDs and MVDs for R

• Dependency enforcement may require joins
• Query workload — due to excessive joins
• Over-decomposition

PrismTech


Shortcomings of BCNF and 4NF

Selection and Projection
• Relational algebra provides operators to select rows (σ) an to project
columns from a relation (π)


• These operation allow to operate on a single relation
Examples:
σage<20 (Student)

Student
sid

name

1234 Peter Parker

age gpa
21

4.0

2345 Tony Stark
15
3456 Bruce Wayne 23

4.0
3.5

PrismTech

sid
name
2345 Tony Stark

age gpa
15 4.0

πname,gpa(Student)
name

gpa

Peter Parker 4.0
Tony Stark 4.0
Bruce Wayne 3.5

• Join is one of the most useful operator in relational algebra and is most

commonly used to combine/reassemble information from two or more
relations

• Join is conceptually a cross product followed by a selection and projection

PrismTech


Joins

• Condition joins are the most general form of joins. This operation takes a
condition and two relations and is defined as follows:
R ⋈c C = σc(RxS)

PrismTech


Condition Joins

• Equijoin is a special case of the Condition Join, where the condition
predicates on attribute equality

PrismTech


Equĳoin

• A Natural Join is a special Equijoin that operates on all the attributes having
the same name in R and S

PrismTech


Natural Join

Relational Design in DDS

• Start decomposing based on properties (can use UML for this)
• Apply a normal form
-

Functional Dependencies => Boyce-Codd Normal Form

-

Multivalued Dependencies => Fourth Normal Form

• Define QoS for the resulting relations and further decompose if you incur in
some QoS Mix (more later)

PrismTech


• Start identifying corse relations and properties of data

• DDS Supports:
-

Selection for a given Topic DDS queries and filters

-

Conditional Joins across multiple Topics via the Multi-Topics

• DDS uses a subset of SQL-92 to express selections, projections and joins

PrismTech


Relational Algebra

• In some instances you may find that a topic (relation) R has two disjoint sets
of attribute X and Y that have conflicting temporal, reliability or durability
requirements

• In this case this relation has to be further decomposed

PrismTech


DDS Speciﬁc Decomposition

Frequency Mix
• Suppose you have a relation R(K, X,Y) were the set of attributes X changes
• In this case you should decompose the relation R into:
!

R1(K, X), R2(K, Y)

!

• This will reduce the resource usage in your system, e.g. bandwidth as well

as CPU but may introduce consistency issues. If consistency is essential then
coherent updates should be used to atomically update R1 and R2

PrismTech


far more frequently than the set of attributes Y (e.g. position, vs. velocity)

Reliability Mix
some soft-state.

!

R1(K, X), R2(K, Y)

!

• This decomposition allows to only use reliable distribution for R1 and besteffort for R2 thus reducing resource usage in the system

PrismTech


• Suppose you have a relation R(K, X,Y) were the set of attributes Y represent

Durability Mix
a different durability than the set of attributes Y, e.g. X need sto be
persistent while Y volatile

!

R1(K, X), R2(K, Y)

!

• This will reduce the resource usage in your system and reduce the pressure
on the Durability Service

PrismTech


• Suppose you have a relation R(K, X,Y) were the set of attributes X requires

Concluding Remarks
• The relational model provides the right set of tools for designing DDS-based
• DDS Topics are relations and DDS supports a subset of relational algebra to
manipulate these relations (topics)

• The design process is as follows:
-

Ensure your model is in BCNF or 4NF — make sure your understand why some
violations are necessary/desirable for your system

-

Add QoS to your relations

-

PrismTech

Start modelling your system using the UML Data Modelling subset

Evaluate if further decomposition is required due to QoS mixes — if your data
model is properly normalised


systems

• A First Course in Database Systems (3rd edition), Ullman and Widom
• Database Management Systems (3rd edition), Ramakrishnan and Gehrke

PrismTech


Books

• Jennifer Widom, Stanfords, Introduction to Databases
-

PrismTech

A very very good course on Databases in general and specifically on relational
data modelling


coursera.org

• Relational Data Models are commonly expressed using, some variation of,
Entity-Relationship (ER) Data Models

• The ER Data Model is built around the concepts of entities, attributes and
relationships (not to be confused with relations!)

PrismTech


Entity Relationship(ER) Data Model

Entities, Attributes and Entity Sets
• An entity is an object in the real world that is distinguishable from other
-

e.g. the iPhone, the Samsumg Galaxy Note, etc.

• An entity is described through a set of attributes
• An entity set identifies a collections of similar entities
-

e.g., Mobile Phones

• Each attribute associated with an entity set must identify its domain
• An entity has a primary key and potentially several candidate keys
PrismTech


objects

Mapping

name

age

sid

gpa

Student

Student Entity Set

PrismTech

sid
1234
2345
3456

name
Peter Parker
Tony Stark
Bruce Wayne

age
21
15
23

Student Entity Set

gpa
4.0
4.0
3.5


• An entity set is mapped to a relation

• A relationship is an association between two or more entities
-

e.g., a student is enrolled in a course

• A relationship can have descriptive attribute to record information about a
relationship

PrismTech


Relationships

Mapping
• The attributes of the resulting relation are:
-

the primary key of each participating entity as foreign keys

-

descriptive attributes as fields of the relation

• The primary key of the resulting relations depends on arity of the
relationship

PrismTech


• A relationship Set is mapped to a relation

Entity Hierarchies
• In some cases it is natural to introduce (type)

name


ssn

Employees

hierarchies among entities

• These hierarchies are represented through
the ISA relationship

hoursWorked

ISA

contractId
hourlyWages

HourlyEmpls

PrismTech

ContractEmpls

ISA relationships can be mapped into two ways

• Map each entity to a distinct relation
• Create only relations for the concrete types
Notice that while the first approach is always applicable, the second is not

PrismTech


Mapping

The Art and Science of DDS Data Modelling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Art and Science of DDS Data Modelling

Similar to The Art and Science of DDS Data Modelling (20)

More from Angelo Corsaro

More from Angelo Corsaro (20)

Recently uploaded

Recently uploaded (20)

The Art and Science of DDS Data Modelling