SlideShare une entreprise Scribd logo
1  sur  11
Télécharger pour lire hors ligne
Information                               Retrieval                                                                                                                                                      P. BAXENDALE,   Editor




A Relational Model of Data for                                                                                                                                      The relational view (or model) of data described in
                                                                                                                                                                 Section 1 appears to be superior in several respects to the
Large Shared Data Banks                                                                                                                                          graph or network model [3,4] presently in vogue for non-
                                                                                                                                                                 inferential systems. It provides a means of describing data
                                                                                                                                                                 with its natural structure only-that   is, without superim-
E. F. CODD                                                                                                                                                       posing any additional structure for machine representation
IBM Research Laboratory,                                                    San Jose, California                                                                 purposes. Accordingly, it provides a basis for a high level
                                                                                                                                                                 data language which will yield maximal independence be-
                                                                                                                                                                 tween programs on the one hand and machine representa-
Future             users        of        large               data            banks                must          be       protected                   from       tion and organization of data on the other.
having             to know               how       the         data            is organized                        in the           machine               (the      A further advantage of the relational view is that it
internal             representation).                           A           prompting                    service            which             supplies           forms a sound basis for treating derivability, redundancy,
such        information                   is not         a satisfactory                           solution.            Activities             of users           and consistency of relations-these are discussedin Section
at      terminals                and            most           application                         programs                  should               remain         2. The network model, on the other hand, has spawned a
unaffected                  when          the      internal                 representation                         of data              is changed               number of confusions, not the least of which is mistaking
and         even         when             some           aspects                of          the         external              representation                     the derivation of connections for the derivation of rela-
are         changed.                 Changes                   in      data            representation                          will        often            be   tions (seeremarks in Section 2 on the “connection trap”).
needed              as       a result             of       changes                   in query,                   update,              and          report           Finally, the relational view permits a clearer evaluation
traffic            and       natural             growth                in the           types              of      stored             information.               of the scope and logical limitations of present formatted
      Existing           noninferential,                        formatted                    data              systems             provide            users      data systems, and also the relative merits (from a logical
with         tree-structured                          files          or       slightly                  more           general                network            standpoint) of competing representations of data within a
models             of the         data.           In Section                   1, inadequacies                            of these                models         single system. Examples of this clearer perspective are
are         discussed.               A       model              based                on n-ary                    relations,               a normal               cited in various parts of this paper. Implementations of
form         for     data            base             relations,               and           the         concept              of      a universal                systems to support the relational model are not discussed.
data         sublanguage                        are        introduced.                      In Section                 2, certain                 opera-
                                                                                                                                                                    1.2. DATA DEPENDENCIES PRESENTSYSTEMS
                                                                                                                                                                                                   IN
tions        on      relations               (other             than           logical                 inference)              are         discussed
                                                                                                                                                                    The provision of data description tables in recently de-
and         applied              to the            problems                    of       redundancy                        and          consistency
                                                                                                                                                                 veloped information systems represents a major advance
in the        user’s          model.
                                                                                                                                                                 toward the goal of data independence [5,6,7]. Such tables
KEY       WORDS             AND          PHRASES:                   data       bank,          data         base,        data         structure,           data   facilitate changing certain characteristics of the data repre-
organization,               hierarchies            of      data,            networks              of     data,        relations,          derivability,          sentation stored in a data bank. However, the variety of
redundancy,                 consistency,              composition,                  join,          retrieval           language,              predicate
calculus,       security,         data       integrity
                                                                                                                                                                 data representation characteristics which can be changed
CR CATEGORIES:                       3.70,         3.73,            3.75,      4.20,         4.22,        4.29                                                   without logically impairing some application programs is
                                                                                                                                                                 still quite limited. Further, the model of data with which
                                                                                                                                                                 users interact is still cluttered with representational prop-
                                                                                                                                                                 erties, particularly in regard to the representation of col-
                                                                                                                                                                 lections of data (as opposed to individual items). Three of
                                                                                                                                                                 the principal kinds of data dependencies which still need
1.        Relational                       Model                    and         Normal                         Form                                              to be removed are: ordering dependence, indexing depend-
                                                                                                                                                                 ence, and accesspath dependence. In some systems these
   1.I. INTR~xJ~TI~N                                                                                                                                             dependencies are not clearly separable from one another.
   This paper is concerned with the application of ele-                                                                                                             1.2.1. Ordering Dependence. Elements of data in a
mentary relation theory to systems which provide shared                                                                                                          data bank may be stored in a variety of ways, someinvolv-
access large banks of formatted data. Except for a paper
       to                                                                                                                                                        ing no concern for ordering, some permitting each element
by Childs [l], the principal application of relations to data                                                                                                    to participate in one ordering only, others permitting each
systems has been to deductive question-answering systems.                                                                                                        element to participate in several orderings. Let us consider
Levein and Maron [2] provide numerous referencesto work                                                                                                          those existing systems which either require or permit data
in this area.                                                                                                                                                    elements to be stored in at least one total ordering which is
   In contrast, the problems treated here are those of data                                                                                                      closely associated with the hardware-determined ordering
independence-the independence of application programs                                                                                                            of addresses.For example, the records of a file concerning
and terminal activities from growth in data types and                                                                                                            parts might be stored in ascending order by part serial
changesin data representation-and certain kinds of data                                                                                                          number. Such systems normally permit application pro-
inconsistency which are expected to become troublesome                                                                                                           grams to assumethat the order of presentation of records
even in nondeductive systems.                                                                                                                                    from such a file is identical to (or is a subordering of) the

Volume               13 / Number                         6 / June,                     1970                                                                                          Communications     of   the   ACM     377
stored ordering. Those application programs which take                      Structure 1. Projects Subordinate                to Parts
  advantage of the stored ordering of a file are likely to fail        File           Segment               Fields
  to operate correctly if for some reason it becomes necessary         F             PART                   part #
  to replace that ordering by a different one. Similar remarks                                              part name
                                                                                                            part description
  hold for a stored ordering implemented              by means of                                           quantity-on-hand
  pointers.                                                                                                 quantity-on-order
     It is unnecessary to single out any system as an example,                       PROJECT                  project #
  because all the well-known      information systems that are                                                project name
  marketed today fail to make a clear distinction between                                                     project description
                                                                                                              quantity committed
  order of presentation on the one hand and stored ordering
  on the other. Significant implementation problems must be
  solved to provide this kind of independence.                             Structure 2. Parts Subordinate                 to Projects
      1.2.2. Indexing Dependence. In the context of for-               File         Sqmeut      Fields
 matted data, an index is usually thought of as a purely               F             PROJECT                project #
 performance-oriented      component of the data representa-                                                project name
 tion. It tends to improve response to queries and updates                                                  project description
                                                                                     PART                      part #
 and, at the same time, slow down response to insertions                                                       part name
 and deletions. From an informational standpoint, an index                                                     part description
 is a redundant component of the data representation.           If a                                           quantity-on-hand
 system uses indices at all and if it is to perform well in an                                                 quantity-on-order
 environment with changing patterns of activity on the data                                                    quantity committed
 bank, an ability to create and destroy indices from time to
 time will probably be necessary. The question then arises:                 Structure 3. Parts and Projects as Peers
 Can application programs and terminal activities remain               Commitment Relationship     Subordinate    to Projects
 invariant as indices come and go?                                     File         Segment     Fields
     Present formatted data systems take widely different              F            PART        part #
 approaches to indexing. TDMS           [7] unconditionally     pro-                            part name
                                                                                                part description
 vides indexing on all attributes.       The presently released                                 quantity-on-hand
 version of IMS [5] provides the user with a choice for each                                    quantity-on-order
 file: a choice between no indexing at all (the hierarchic se-         G            PROJECT     project #
 quential organization)      or indexing on the primary key                                     project name
 only (the hierarchic indexed sequent,ial organization).          In                            project description
                                                                                    PART           part #
 neither case is the user’s application logic dependent on the                                     quantity committed
 existence of the unconditionally        provided indices. IDS
 [8], however, permits the fle designers to select attributes               Structure 4. Parts and Projects as Peers
 to be indexed and to incorporate indices into the file struc-         Commitment         Relationship        Subordinate to Parts
 ture by means of additional chains. Application programs              File         Segnren1           Fields
taking advantage of the performance benefit of these in-               F            PART               part #
dexing chains must refer to those chains by name. Such pro-                                            part description
grams do not operate correctly if these chains are later                                               quantity-on-hand
removed.                                                                                               quantity-on-order
                                                                                    PROJECT                project #
     1.2.3. Access Path Dependence. Many of the existing                                                   quantity committed
formatted data systems provide users with tree-structured              G            PROJECT            project #
files or slightly more general network models of the data.                                             project name
Application programs developed to work with these sys-                                                 project description
tems tend to be logically impaired if the trees or networks
are changed in structure. A simple example follows.                              Structure 5. Parts, Projects, and
     Suppose the data bank contains information about parts                      Commitment Relationship    as Peers
                                                                       FCZC          .%&-,,ZC,,t            Ficlds
and projects. For each part, the part number, part name,
part description, quantity-on-hand,       and quantity-on-order        F             PART                   part #
                                                                                                            part name
are recorded. For each project, the project number, project                                                 part description
name, project description are recorded. Whenever a project                                                  quantity-on-hand
makes use of a certain part, the quantity of that part com-                                                 quantity-on-order
mitted to the given project is also recorded. Suppose that             G             PROJECT                project #
the system requires the user or file designer to declare or                                                 project name
                                                                                                            project description
define the data in terms of tree structures. Then, any one             H             COMMIT                 part #
of the hierarchical structures may be adopted for the infor-                                                project #
mation mentioned above (see Structures           l-5).                                                      quantity committed

378       Communications      of the   ACM                                                         Volume        13 / Number      6 / June,   1970
Now, consider the problem of printing out the part                             ray which represents an n-ary relation R has the following
number, part name, and quantity committed for every part                           properties :
used in the project whose project name is “alpha.” The                                (1) Each row represents an n-tuple of R.
following observations      may be made regardless of which                           (2) The ordering of rows is immaterial.
available tree-oriented information        system is selected to                      (3) All rows are distinct.
tackle this problem. If a program P is developed for this                             (4) The ordering of columns is significant-it        corre-
problem assuming one of the five structures          above-that                            sponds to the ordering S1, Sz , . . . , S, of the do-
is, P makes no test to determine which structure is in ef-                                 mains on which R is defined (see, however, remarks
fect-then      P will fail on at least three of the remaining                              below on domain-ordered and domain-unordered
structures. More specifically, if P succeeds with structure 5,                             relations ) .
it will fail with all the others; if P succeeds with structure 3                      (5) The significance of each column is partially con-
or 4, it will fail with at least 1,2, and 5; if P succeeds with                            veyed by labeling it with the name of the corre-
1 or 2, it will fail with at least 3, 4, and 5. The reason is                              sponding domain.
simple in each case. In the absence of a test to determine                            The example in Figure 1 illustrates a relation of degree
which structure is in effect, P fails because an attempt is                        4, called supply, which reflects the shipments-in-progress
made to exceute a reference to a nonexistent file (available                       of parts from specified suppliers to specified projects in
systems treat this as an error) or no attempt is made to                           specified quantities.
execute a reference to a file containing needed information.
                                                                                            supply         (supplier             part            project          quantity)
The reader who is not convinced should develop sample
programs for this simple problem.                                                                                  1                 2                 5                 17
                                                                                                                   1                 3                 5                 23
    Since, in general, it is not practical to develop applica-
                                                                                                                  2                  3                 7                  9
tion programs which test for all tree structurings      permitted                                                 2                  7                 5                  4
by the system, these programs fail when a change in                                                               4                   1                1                 12
&ructure becomes necessary.                                                                              FIG. 1.         A relation       of degree          4
    Systems which provide users with a network model of
the data run into similar difficulties. In both the tree and                          One might ask: If the columns are labeled by the name
network      cases, the user (or his program)      is required to                  of corresponding domains, why should the ordering of col-
exploit a collection of user access paths to the data. It does                     umns matter? As the example in Figure 2 shows, two col-
not matter whether these paths are in close correspondence                         umns may have identical headings (indicating identical
with pointer-defined paths in the stored representation-in                         domains) but possess    distinct meanings with respect to the
IDS the correspondence is extremely simple, in TDMS it is                          relation. The relation depicted is called component. It is a
just the opposite. The consequence, regardless of the stored                       ternary relation, whose first two domains are called part
representation, is that terminal activities and programs be-                       and third domain is called quantity. The meaning of com-
come dependent on the continued existence of the user                              ponent (2, y, z) is that part x is an immediate component
access paths.                                                                       (or subassembly) of part y, and z units of part 5 are needed
    One solution to this is to adopt the policy that once a                        to assembleone unit of part y. It is a relation which plays
user access path is defined it will not be made obsolete un-                       a critical role in the parts explosion problem.
til all application programs using that path have become
obsolete. Such a policy is not practical, because the number                                 component                 (part              part                   quantity)

of access paths in the total model for the community of                                                                    1                     5                      9
users of a data bank would eventually become excessively                                                                   2                     5                      7
                                                                                                                           3                     5                      2
large.
                                                                                                                           2                     6                     12
   1.3. A RELATIONAL       VIEW OF DATA                                                                                    3                     6                      3
   The term relation is used here in its accepted mathe-                                                                   4                     7                       1
                                                                                                                           6                     7                       1
matical sense.Given sets X1, S, , . . . , S, (not necessarily
distinct), R is a relation on these n sets if it is a set of n-                            FIG. 2.        A relation           with-two      identical             domains
tuples each of which has its first element from S1, its
second element from Sz , and so on.’ We shall refer to Si as                          It is a remarkable fact that several existing information
the jth domain of R. As defined above, R is said to have                           systems (chiefly those based on tree-structured files) fail
degreen. Relations of degree 1 are often called unary, de-                         to provide data representations for relations which have
gree 2 binary, degree 3 ternary, and degree n n-ary.                               two or more identical domains. The present version of
   For expository reasons, we shall frequently make use of                         IMS/360 [5] is an example of such a system.
an array representation of relations, but it must be re-                              The totality of data in a data bank may be viewed as a
membered that this particular representation is not an es-                         collection of time-varying relations. These relations are of
sential part of the relational view being expounded. An ar-                        assorted degrees. As time progresses,each n-ary relation
                                                                                   may be subject to insertion of additional n-tuples, deletion
1 More concisely,         R is   a subset    of the   Cartesian   product   81 X   of existing ones, and alteration of components of any of its
sz x     *.*   x   87%.                                                            existing n-tuples.

Volume         13 / Number       6 / June,     1970                                                             Communications                       of    the        ACM     379
In many commercial, governmental,          and scientific data      names, and part numbers are. We shall call the set of
  banks, however, some of the relations are of quite high de-            values represented at someinstant the active domain at that
  gree (a degree of 30 is not at all uncommon). Users should             instant.
 not normally be burdened with remembering the domain                        Normally, one domain (or combination of domains) of a
  ordering of any relation (for example, the ordering supplier,          given relation has values which uniquely identify each ele-
 then part, then project, then quantity in the relation supply).         ment (n-tuple) of that relation. Such a domain (or com-
 Accordingly, we propose that users deal, not with relations             bination) is called a primary key. In the example above,
 which are domain-ordered, but with relationships which are              part number would be a primary key, while part color
 their domain-unordered      counterparts.2 To accomplish this,         would not be. A primary key is nonredundant if it is either
 domains must be uniquely identifiable at least within any               a simple domain (not a combination) or a combination
 given relation, without using position. Thus, where there              such that none of the participating simple domains is
 are two or more identical domains, we require in each case             superfluous in uniquely identifying each element. A rela-
 that the domain name be qualified by a distinctive              role   tion may possessmore than one nonredundant primary
 name, which serves to identify the role played by that                 key. This would be the casein the example if different parts
 domain in the given relation. For example, in the relation             were always given distinct names. Whenever a relation
 component of Figure 2, the first domain part might be                  has two or more nonredundant primary keys, one of them
 qualified by the role name sub, and the second by super, so            is arbitrarily selected and called the primary key of that re-
 that users could deal with the relationship component and              lation.
 its domains-sub.part     super.part,   quantity-without      regard        A common requirement is for elements of a relation to
 to any ordering between these domains.                                 cross-reference other elements of the same relation or ele-
    To sum up, it is proposed that most usersshould interact            ments of a different relation. Keys provide a user-oriented
 with a relational model of the data consisting of a collection         means (but not the only means) of expressing such cross-
 of time-varying relationships (rather than relations). Each            references. We shall call a domain (or domain combma-
 user need not know more about any relationship than its                tion) of relation R a foreign key if it is not the primary key
 name together with the names of its domains (role quali-               of R but its elements are values of the primary key of some
 fied whenever necessary): Even this information might be               relation S (the possibility that S and R are identical is not
 offered in menu style by the system (subject to security               excluded). In the relation supply of Figure 1, the combina-
and privacy constraints) upon request by the user.                      tion of supplier, part, project is the primary key, while each
    There are usually many alternative ways in which a re-              of these three domains taken separately is a foreign key.
lational model may be established for a data bank. In                       In previous work there has been a strong tendency to
order to discuss a preferred way (or normal form), we                   treat the data in a data bank as consisting of two parts, one
must first introduce a few additional concepts (active                  part consisting of entity descriptions (for example, descrip-
domain, primary key, foreign key, nonsimple domain)                     tions of suppliers) and the other part consisting of rela-
and establish some links with terminology currently in use              tions between the various entities or types of entities (for
in information systems programming. In the remainder of                 example, the supply relation). This distinction is difficult
this paper, we shall not bother to distinguish between re-              to maintain when one may have foreign keys in any rela-
lations and relationships except where it appears advan-                tion whatsoever. In the user’s relational model there ap-
tageous to be explicit.                                                 pears to be no advantage to making such a distinction
    Consider an example of a data bank which includes rela-              (there may be some advantage, however, when one applies
tions concerning parts, projects, and suppliers. One rela-              relational concepts to machine representations of the user’s
tion called part is defined on the following domains:                   set of relationships).
     (1) part number                                                        So far, we have discussedexamples of relations which are
    (2) part name                                                       defined on simple domains-domains whose elements are
    (3) part color                                                      atomic (nondecomposable) values. Nonatomic values can
    (4) part weight                                                     be discussedwithin the relational framework. Thus, some
    (5) quantity on hand                                                domains may have relations as elements. These relations
    (6) quantity on order                                               may, in turn, be defined on nonsimple domains, and so on.
and possibly other domains as well. Each of these domains               For example, one of the domains on which the relation em-
is, in effect, a pool of values, some or all of which may be            ployee is defined might be salary history. An element of the
represented in the data bank at any instant. While it is                salary history domain is a binary relation defined on the do-
conceivable that, at some instant, all part colors are pres-            main date and the domain salary. The salary history domain
ent, it is unlikely that all possible part weights, part                is the set of all such binary relations. At any instant of time
2 In mathematical terms, a relationship is an equivalence class of      there are as many instances of the salary history relation
those relations that are equivalent under permutation of domains        in the data bank as there are employees. In contrast, there
(see Section 2.1.1).                                                    is only one instance of the employeerelation.
* Naturally, as with any data put into and retrieved from a com-
puter system, the user will normally make far more effective use            The terms attribute and repeating group in present data
of the data if he is aware of its meaning.                              base terminology are roughly analogous to simple domain

380       Communications      of the   ACM                                                      Volume   13 / Number   6 / June,   1970
and nonsimple domain, respectively. Much of the confusion                                    If normalization as described above is to be applicable,
 in present terminology is due to failure to distinguish be-                              the unnormalized collection of relations must satisfy the
 tween type and instance (as in “record”)        and between                              following conditions :
 components of a user model of the data on the one hand                                        (1) The graph of interrelationships of the nonsimple
 and their machine representation      counterparts    on the                             domains is a collection of trees.
 other hand (again, we cite “record”    as an example).                                        (2) No primary key has a component domain which is
   1.4. NORMAL FORM                                                                       nonsimple.
   A relation whose domains are all simple can be repre-                                     The writer knows of no application which would require
sented in storage by a two-dimensional column-homo-                                       any relaxation of these conditions. Further operations of a
geneous array of the kind discussed above. Some more                                      normalizing kind are possible. These are not discussedin
complicated data structure is necessary for a relation with                               this paper.
one or more nonsimple domains. For this reason (and others                                   The simplicity of the array representation which becomes
to be cited below) the possibility of eliminating nonsimple                               feasible when all relations are cast in normal form is not
domains appears worth investigating! There is, in fact, a                                 only an advantage for storage purposes but also for com-
very simple elimination procedure, which we shall call                                    munication of bulk data between systems which usewidely
normalization.                                                                            different representations of the data. The communication
   Consider, for example, the collection of relations ex-                                 form would be a suitably compressedversion of the array
hibited in Figure 3 (a). Job history and children are non-                                representation and would have the following advantages:
simple domains of the relation employee.Salary history is a                                   (1) It would be devoid of pointers (address-valued or
nonsimple domain of the relation job history. The tree in                                 displacement-valued ) .
Figure 3 (a) shows just these interrelationships of the non-                                  (2) It would avoid all dependence on hash addressing
simple domains.                                                                          schemes.
                                                                                              (3) It would contain no indices or ordering lists.
                                           employee                                          If the user’s relational model is set up in normal form,
                                               I                                         names of items of data in the data bank can take a simpler
                                                                                         form than would otherwise be the case. A general name
           jobhistory                                                   children         would take a form such as
               I
         salaryhistory                                                                                                R (g).r.d

employee (man#, name, birthdate, jobhistory,                       children)             where R is a relational name; g is a generation identifier
jobhistory (jobdate, title, salaryhistory)                                                (optional); r is a role name (optional); d is a domain name.
salaryhistory   (salarydate, salary)                                                     Since g is needed only when several generations of a given
children (childname, birthyear)
                                                                                         relation exist, or are anticipated to exist, and r is needed
                         FIG.    3(a).    Unnormalized      set                          only when the relation R has two or more domains named
                                                                                         d, the simple form R.d will often be adequate.
employee’ (man#, name, birthdate)
jobhistory’    (man#, jobdate, title)                                                         1.5. SOME LINGUISTIC ASPECTS
salaryhistory’    (man#, jobdate, salarydate,            salary)                             The adoption of a relational model of data, as described
children’ (man#, childname, birthyear)                                                    above, permits the development of a universal data sub-
                          FIG.    3(b).    Normalized     set                             language based on an applied predicate calculus. A first-
                                                                                          order predicate calculus s&ices if the collection of relations
   Normalization proceeds as follows. Starting with the re-                               is in normal form. Such a language would provide a yard-
lation at the top of the tree, take its primary key and ex-                               stick of linguistic power for all other proposed data Ian-
pand each of the immediately subordinate relations by                                     guages, and would itself be a strong candidate for embed-
inserting this primary key domain or domain combination.                                  ding (with appropriate syntactic modification) in a variety
The primary key of each expanded relation consists of the                                 of host Ianguages (programming, command- or problem-
primary key before expansion augmented by the primary                                    oriented). While it is not the purpose of this paper to
key copied down from the parent relation. Now, strike out                                describe such a language in detail, its salient features
from the parent relation all nonsimple domains, remove the                               would be as follows.
top node of the tree, and repeat the same sequence of                                        Let us denote the data sublanguage by R and the host
operations on each remaining subtree.                                                    language by H. R permits the declaration of relations and
   The result of normalizing the collection of relations in                              their domains. Each declaration of a relation identifies the
Figure 3 (a) is the collection in Figure 3 (b). The primary                              primary key for that relation. Declared relations are added
key of each relation is italicized to show how such keys                                 to the system catalog for use by any members of the user
are expanded by the normalization.                                                       community who have appropriate authorization. H per-
4 M. E. Sanko of IBM, San Jose, independently                         recognized   the   mits supporting declarations which indicate, perhaps less
desirability of eliminating nonsimple domains.                                           permanently, how these relations are represented in stor-

Volume      13 / Number          6 / June,     1970                                                          Communications       of   the   ACM   381
age. R permits the specification for retrieval of any subset            4-ary relation supply of Figure 1, which entails 5 names in
of data from the data bank. Action on such a retrieval re-              n-ary notation, would be represented in the form
quest is subject to security constraints.
                                                                                P (supplier, & (part, R (project, quantity)))
    The universality     of the data sublanguage lies in its
descriptive ability (not its computing ability). In a large             in nested binary notation and, thus, employ 7 names.
data bank each subset of the data has a very large number                  -4 further disadvantage of this kind of expression is its
of possible (and sensible) descriptions, even when we as-               asymmetry. Although this asymmetry does not prohibit
sume (as we do) that there is only a finite set of function             symmetric exploitation, it certainly makes some basesof
subroutines to which the system has access for use in                   interrogation very awkward for the user to express (con-
qualifying data for retrieval. Thus, the class of qualification         sider, for example, a query for those parts and quantities
expressions which can be used in a set specification must               related to certain given projects via & and R).
have the descriptive power of the class of well-formed                      1.6. EXPRESSIBLE,      NAMED, AND STORED RELATIONS
formulas of an applied predicate calculus. It is well known                Associated with a data bank are two collections of rela-
that to preserve this descriptive power it is unnecessary to            tions: the named set and the expressibleset. The named set
express (in whatever syntax is chosen) every formula of                 is the collection of all those relations that the community of
the selected predicate calculus. For example, just those in             userscan identify by means of a simple name (or identifier).
prenex normal form are adequate [9].                                    A relation R acquires membership in the named set when a
    Arithmetic functions may be needed in the qualification             suitably authorized user declares R; it loses membership
or other parts of retrieval statements. Such functions can              when a suitably authorized user cancels the declaration of
be defined in H and invoked in R.                                       R.
    A set so specified may be fetched for query purposes                   The expressible set is the total collection of relations that
only, or it may be held for possible changes. Insertions t,ake          can be designated by expressionsin the data language. Such
the form of adding new elements to declared relations with-             expressionsare constructed from simple names of relations
out regard to any ordering that may be present in their                 in the named set; names of generations, roles and domains;
machine representation.       Deletions which are effective for         logical connectives; the quantifiers of the predicate calcu-
the community       (as opposed to the individual user or sub-          1~s;~and certain constant relation symbols such as = , > .
communities) take the form of removing elements from de-                The named set is a subset of the expressible set-usually a
clared relations. Some deletions and updates may be trig-               very small subset.
gered by others, if deletion and update dependencies be-                   Since some relations in the named set may be time-inde-
tween specified relations are declared in R.                            pendent combinations of others in that set, it is useful to
    One important effect that the view adopted toward data              consider associating with the named set a collection of
has on the language used to retrieve it is in the naming of             statements that define these time-independent constraints.
data elements and sets. Some aspects of this have been dis-             We shall postpone further discussion of this until we have
cussed in the previous section. With the usual network                  introduced several operations on relations (seeSection 2).
view, users will often be burdened with coining and using                  One of the major problems confronting the designer of a
more relat,ion names than are absolutely necessary, since               data system which is to support a relational model for its
names are associated with paths (or path types) rather                  users is that of determining the class of stored representa-
than with relations.                                                    tions to be supported. Ideally, the variety of permitted
    Once a user is aware that a certain relation is stored, he          data representations should be just adequate to cover the
will expect to be able to exploit5 it using any combination             spectrum of performance requirements of the total col-
of its arguments as “knowns”           and the remaining argu-          lection of installations. Too great a variety leads to un-
ments as “unknowns,”           because the information       (like      necessary overhead in storage and continual reinterpreta-
Everest) is there. This is a system feature (missing from               tion of descriptions for the structures currently in effect.
many current informat.ion systems)          which we shall call            For any selected class of stored representations the data
 (logically) symmetric expZoitation of relations. Naturally,            system must provide a means of translating user requests
symmetry in performance is not to be expected.                          expressedin the data language of the relational model into
    To support symmetric exploitation of a single binary re-            corresponding-and         elhcient-actions    on the current
lation, two directed paths are needed. For a relation of de-            stored representation. For a high level data language this
gree n, the number of paths to be named and controlled is               presents a challenging design problem. Nevertheless, it is a
n factorial.                                                            problem which must be solved-as more users obtain con-
    Again, if a relational view is adopted in which every n-            current accessto a large data bank, responsibility for pro-
ary relation (n > 2) has to be expressed by the user as a               viding efficient response and throughput shifts from the
nested expression involving only binary relations             (see      individual user to the data system.
Feldman’s LEAP System [lo], for example) then 2n - 1
names have to be coined instead of only n + 1 with direct               6 Because each relation in a practical data bank is a finite set at
                                                                        every instant of time, the existential   and universal quantifiers
n-ary notation as described in Section 1.2. For example, the            can be expressed in terms of a function that counts the number of
6 Exploiting      a relation    includes query, update,   and delete.   elements in any finite set.

382            Communications         of the   ACM                                                Volume    13 / Number    6 / June,   1970
2.   Redundancy        and   Consistency                             ternary relation which preserves all of the information in
                                                                     the given relations?
    2.1. OPERATIONS ON RELATIONS
                                                                        The example in Figure 5 shows two relations R, S, which
    Since relations are sets, all of the usual set operations are
                                                                     are joinable without loss of information, while Figure 6
 applicable to them. Nevertheless, the result may not be a
                                                                     shows a join of R with S. A binary relation R is joinable
 relation; for example, the union of a binary relation and a
                                                                     with a binary relation S if there exists a ternary relation U
 ternary relation is not a relation.
                                                                     such that 7r12  (U) = R and ‘1~23 (U) = S. Any such ternary
    The operations discussedbelow are specifically for rela-
                                                                     relation is called a join of R with S. If R, S are binary rela-
 tions. These operations are introduced becauseof their key
                                                                    tions such that ~2(R) = ~1(S), then R is joinable with S.
 role in deriving relations from other relations. Their
                                                                     One join that always exists in such a case is the natural
 principal application is in noninferential information sys-
                                                                    join of R with S defined by
 tems-systems which do not provide logical inference
                                                                                        R*S       = {(a, b, c):R(a,              b)     A S(b, c))
 services-although their applicability is not necessarily
                                                                    where R (a, b) has the value true if (a, b) is a member of R
 destroyed when such services are added.
                                                                    and similarly for S(b, c). It is immediate that
    Most users would not be directly concerned with these
 operations. Information systems designersand people con-                                                    TB(R*S)          = R
 cerned with data bank control should, however, be thor-
                                                                    and
 oughly familiar with them.
    2.1.1. Permutation.     A binary relation has an array                                                   T33(R*S)         = S.
representation with two columns. Interchanging these col-
                                                                       Note that the join shown in Figure 6 is the natural join
umns yields the converse relation. More generally, if a
                                                                    of R with S from Figure 5. Another join is shown in Figure
permutation is applied to the columns of an n-ary relation,
                                                                    7.
the resulting relation is said to be a permutation         of the
given relation. There are, for example, 4! = 24 permuta-
tions of the relation supply in Figure 1, if we include the                             II31 hPPb/)                (project                 supplier)
identity permutation which leaves the ordering of columns                                                              5                         1
unchanged.                                                                                                             5                         2
    Since the user’s relational model consists of a collection                                                         1                         4
                                                                                                                       7                         2
of relationships (domain-unordered relations), permuta-
tion is not relevant to such a model considered in isolation.          FIG. 4. A permuted                    projection       of the relation         in Figure 1
It is, however, relevant to the consideration of stored
representations of the model. In a system which provides
symmetric exploitation of relations, the set of queries                          R       (supplier           Part)              S (part             project)
answerable by a stored relation is identical to the set                                     1                  1                   1                     1
                                                                                            2                  1                   1                     2
answerable by any permutation of that relation. Although
                                                                                           2                   2                   2                     1
it is logically unnecessary to store both a relation and some
permutation of it, performance considerations could make                                         FIG.    5. Two joinable              relations
it advisable.
    2.1.2. Projection.    Suppose now we select certain col-
                                                                                        R*S           (supplier               part                project)
umns of a relation (striking out the others) and then re-
move from the resulting array any duplication in the rows.                                                 1                    1                     1
                                                                                                           1                    1                     2
The final array represents a relation which is said to be a
                                                                                                           2                    1                     1
projection of the given relation.                                                                          2                    1                     2
    A selection operator ?r is used to obtain any desired                                                  2                   2                      1
permutation, projection, or combination of the two opera-
                                                                          FIG.       6. The natural               join of R with S (from Figure 5)
tions. Thus, if L is a list of lc indices7 L = i1, ii, - . - , ik
and R is an n-ary relation (n 2 k ), then rrL(R ) is the k-ary
relation whose jth column is column ii of R (j = 1,2, * * . ,k)                         U        (supplier                part                project)
except that duplication in resulting rows is removed. Con-
                                                                                                      1                      1                    2
sider the relation supply of Figure 1. A permuted projection                                          2                     1                     1
of this relation is exhibited in Figure 4. Note that, in this                                         2                     2                     1
particular case, the projection has fewer n-tuples than the
                                                                                 FIG.       7. Another        join of R with S (from Figure 5)
relation from which it is derived.
    2.1.3. Join.      Suppose we are given two binary rela-
tions, which have some domain in common. Under what                    Inspection of these relations reveals an element (ele-
circumstances can we combine these relations to form a              ment 1) of the domain part (the domain on which the join
7 When dealing with relationships,   we use domain names (role-     is to be made) with the property that it possesses    more
qualified whenever necessary) instead of domain positions.          than one relative under R and also under S. It is this ele-

Volume   13 / Number     6 / June,   1970                                                               Communications                 of   the      ACM        383
ment which gives rise to the plurality of joins. Such an ele-                               y), and T with R (say a), and, furthermore, y must be a
ment in the joining domain is called a point of ambiguity                                   relative of x under S, z a relative of y under T, and x a
with respect to the joining of R with S.                                                    relative of z under R. Note that in Figure 8 the points
   If either ~1 (R) or S is a function: no point of ambiguity                               2 = a; y = d; x = 2 have this property.
can occur in joining R with S. In such a case, the natural                                     The natural linear 3-join of three binary relations R, S,
join of R with S is the only join of R with S. Note that the                                T is given by
reiterated qualification “of R with S” is necessary, because
S might be joinable with R (as well as R with S), and this                                    R*S*T = { (a, b, c, d):R (a, b) A S (b, c) A T (c, d)}
join would be an entirely separate consideration. In Figure
5, none of the relations R, 7r21(R), S, ?rzl(S) is a function.                              where parentheses are not needed on the left-hand side be-
   Ambiguity in the joining of R with S can sometimes be                                    cause the natural 2-join (*) is associative. To obtain the
resolved by means of other relations. Suppose we are given,                                 cyclic counterpart, we introduce the operator y which pro-
or can derive from sources independent of R and S, a rela-                                  duces a relation of degree n - 1 from a relation of degree n
tion T on the domains project and supplier with the follow-                                 by tying its ends together. Thus, if R is an n-ary relation
ing properties :                                                                            (n 2 2), the tie of R is defined by the equation

            (1) m(T)         = m(S),                                                        r(R)   = {(a~, a2, .-. , a,-l):R(ul,    az, ... , a,-~, a,)
                                                                                                                                              A al = a,).
            (2) m(T)         = al(R),
                                                                                            We may now represent the natural cyclic S-join of R, S, T
            (3) T(i         s> +~P(R(&                   P> A S(P,~))~
                                                                                            by the expression
            (4) R(s, P> -+ 3j(Soj,.i)                         *    T(.A s)),
                                                                                                                     y (R*S*T).
            (5) S@,.i)        + 3s(T(.?, s> A R(s, P>>,
then we may form a three-way join of R, S, T; that is, a                                        Extension of the notions of linear and cyclic S-join and
ternary relation such that                                                                   their natural counterparts to the joining of n binary rela-
       mz(U) = R,                 7r23(U)           = s,            ml(U)       = T.         tions (where n 2 3) is obvious. A few words may be ap-
                                                                                             propriate, however, regarding the joining of relations which
   Such a join will be called a cyclic 3-join to distinguish it                              are not necessarily binary. Consider the case of two rela-
from a linear S-join which would be a quaternary relation                                    tions R (degree r ), S (degree s) which are to be joined on
V such that                                                                                  p of their domains (p < T, p < s). For simplicity, sup-
                                                                                             pose these p domains are the last p of the r domains of R,
        m(V) = R,                 lr23W)            = s,           lr34(V) = T.
                                                                                             and the first p of the s domains of S. If this were not so, we
  While it is possiblefor more than one cyclic 3-join to exist                               could always apply appropriate permutations to make it
(seeFigures 8,9, for an example), the circumstances under                                   so. Now, take the Cartesian product of the first r-p do-
which this can occur entail much more severe constraints                                    mains of R, and call this new domain A. Take the Car-
                                                                                            tesian product of the last p domains of R, and call this B.
              R (s P)                  s      (P 23                 T     0’   s)           Take the Cartesian product of the last s-p domains of S
                      1 a                      a d                         d    1           and call this C.
                      2 a                                                  d   2                We can treat R as if it were a binary relation on the
                      2 b                      b&Ii                        e   2            domains A, B. Similarly, we can treat S as if it were a bi-
                                               b e                         e   2            nary relation on the domains B, C. The notions of linear
  FIG. 8. Binary       relations       with         a plurality         of cyclic 3-joins   and cyclic S-join are now directly applicable. A similar ap-
                                                                                            proach can be taken with the linear and cyclic n-joins of n
                  u     b P 8                            TJ’       (s P i)                  relations of assorted degrees.
                                                                                                2.1.4. Composition. The reader is probably familiar
                        1     a    d                                1 a d
                        2     a    e                                2 a d                   with the notion of composition applied to functions. We
                        2     b    d                                2 a e                   shall discussa generalization of that concept and apply it
                        2     b    e                                2 b d                   first to binary relations. Our definitions of composition
                                                                    2 b e                   and composability are based very directly on the definitions
      FIG. 9. Two cyclic 3-joins of the relations                         in Figure 8       of join and joinability given above.
                                                                                                Suppose we are given two relations R, S. T is a cam-
than those for a plurality of 2-joins. To be specific, the re-                              position of R with S if there exists a.join U of R with S such
lations R, S, T must possesspoints of ambiguity with                                        that T = aI3(U) . Thus, two relations are composable if
respect to joining R with S (say point x), S with T (say                                    and only if they are joinable. However, the existence of
                                                                                            more than one join of R with S doesnot imply the existence
8 A function is a binary relation,              which is one-one or many-one,               of more than one composition of R with S.
but not one-many.                                                                               Corresponding to the natural join of R with S is the

384        Communications                  of the     ACM                                                           Volume   13 / Number   6 / June,   1970
ndural composition9 of R with S defined by                                       2.1.5. Restriction. A subset of a relation is a relation.
                                                                              One way in which a relation S may act on a relation R to
                           R.S         = TH(R*S).
                                                                              generate a subset of R is through the operation restriction
 Taking the relations R, S from Figure 5, their natural com-                  of R by S. This operation is a generalization of the restric-
 position is exhibited in Figure 10 and another composition                   tion of a function to a subset of its domain, and is defined
is exhibited in Figure 11 (derived from the join exhibited                    as follows.
in Figure 7).                                                                    Let L, M be equal-length lists of indices such that
                 R. S        (project              supplier)                  L = iI,&., *** ,ik,M     = jI,j2,  e-s , jk where k 5 degree
                                                                              of R and k 6 degree of S. Then the L, M restriction of R by
                                      1                   1
                                      1                   2                   S denoted RLIMS is the maximal subset R’ of R such that
                                      2                   1
                                                                                                     ?rL(R’) = TM(S).

 FIG. 10. The natural       composition      of R with S (from Figure 5)      The operation is defined only if equality is applicable be-
                                                                              tween elements of ?T+, on the one hand and rjh (S) on
                                                                                                     (R)
                  T        (project                supplier)                  the other for all h = 1, 2, . . . , k.
                               1                          2                      The three relations R, S, R’ of Figure 13 satisfy the equa-
                               2                    1                         tion R' = Rw~wsS.
  FIQ. 11. Another        composition       of R with S (from Figure 5)                 R (8   P   ~3       s   (P 8           R' (8     P 3
   When two or more joins exist, the number of distinct                                     1aA                  a A                  1aA
compositions may be as few as one or as many as the num-                                    2   a A              c B                  2 a A
                                                                                            2  a B               b B                  2 b B
ber of distinct joins. Figure 12 shows an example of two
                                                                                            2bA
relations which have several joins but only one composition.                                2 b B
Note that the ambiguity of point c is lost in composing R                                     FIG. 13. Example      of restriction
with S, because of unambiguous associationsmade via the
points a, b, d, e.                                                               We are now in a position to consider various applications
          R    (supplier       part)                S (part        project)
                                                                              of these operations on relations.
                    1                                                             2.2. REDUNDANCY
                    1                 z                        ;        ;         Redundancy in the named set of relations must be dis-
                   1                  c                        c        f      tinguished from redundancy in the stored set of representa-
                      2                                                 is     tions. We are primarily concerned here with the former.
                      2               8                        :
                                                                               To begin with, we need a precise notion of derivability for
                      2               e                        e        f
                                                                              relations.
          FICA 12. Many joins, only one composition
                                                                                  Suppose 0 is a collection of operations on relations and
   Extension of composition to pairs of relations which are                    each operation has the property that from its operands it
not necessarily binary (and which may be of different de-                     yields a unique relation (thus natural join is eligible, but
grees) follows the same pattern as extension of pairwise                      join is not ). A relation R is O-derivablefrom a set S of rela-
joining to such relations.                                                     tions if there exists a sequence of operations from the col-
   A lack of understanding of relational composition has led                  lection 0 which, for all time, yields R from members of S.
several systems designers into what may be called the                          The phrase “for all time” is present, becausewe are dealing
connection trap. This trap may be described in terms of the                   with time-varying relations, and our interest is in derivabil-
following example. Suppose each supplier description is                       ity which holds over a significant period of time. For the
linked by pointers to the descriptions of each part supplied                  named set of relationships in noninferential systems, it ap-
by that supplier, and each part description is similarly                      pears that an adequate collection & contains the following
linked to the descriptions of each project which uses that                    operations: projection, natural join, tie, and restriction.
part. A conclusion is now drawn which is, in general, er-                     Permutation is irrelevant and natural composition need
roneous: namely that, if all possiblepaths are followed from                  not be included, becauseit is obtainable by taking a natural
a given supplier via the parts he supplies to the projects                    join and then a projection. For the stored set of representa-
using those parts, one will obtain a valid set of all projects                tions, an adequate collection e2of operations would include
supplied by that supplier. Such a conclusion is correct                       permutation and additional operations concerned with sub-
only in the very special case that the target relation be-                    setting and merging relations, and ordering and connecting
tween projects and suppliers is, in fact, the natural com-                    their elements.
position of the other two relations-and we must normally                          2.2.1. Strong Redundancy. A set of relations is strongly
add the phrase “for all time,” because this is usually im-                    redundant if it contains at least one relation that possesses
plied in claims concerning path-following techniques.                         a projection which is derivable from other projections of
0 Other writers tend to ignore compositions other than the na-                relations in the set. The following two examples are in-
 tural one, and accordingly refer to this particular composition as           tended to explain why strong redundancy is defined this
the composition-see,    for example, Kelley’s “General Topology.”             way, and to demonstrate its practical use. In the first ex-

Volume   13 / Number        6 / June,       1970                                                   Communications         of    the    ACM     385
ample the collection of relations            consists of just the follow-    The relations al2(P), 7r12 7r12 ) are complexlo relations
                                                                                                       (Q),   (R
ing relation :                                                               with the possibility of points of ambiguity occurring from
                                                                             time to time in the potential joining of any two. Under
      employee (serial #, name, manager#, managername)
                                                                             these circumstances, none of them is derivable from the
with serial# as the primary key and manager# as a foreign                    other two. However, constraints do exist between them,
key. Let us denote the active domain by A,, and suppose                      since each is a projection of some cyclic join of the three of
that                                                                         them. One of the weak redundancies can be characterized
                                                                             by the statement: for all time, 1r12 is somecomposition
                                                                                                                 (P)
                   A, (munuger#) c A, (serial#)
                                                                             of ~12(Q) with ‘~~1 (R). The composition in question might
and                                                                          be the natural one at some instant and a nonnatural one at
                                                                             another instant.
                 At (managername) C At (name)
                                                                                Generally speaking, weak redundancies are inherent in
for all time t. In this case the redundancy is obvious: the                  the logical needs of the community of users. They are not
domain managernameis unnecessary. To see that it is a                        removable by the system or data base administrator. If
strong redundancy as defined above, we observe that                          they appear at all, they appear in both the named set and
        m4(employee) = ~12   (empZoyee)&r3(employee).                        the stored set of representations.
In the secondexample the collection of relations includes a                     2.3. CONSISTENCY
relation S describing suppliers with primary key s#, a re-                      Whenever the named set of relations is redundant in
lation D describing departments with primary key d#, a                       either sense,we shall associatewith that set a collection of
relation J describing projects with primary key j#, and the                  statements which define all of the redundancies which hold
following relations :                                                        independent of time between the member relations. If the
lJ (s#, d#, - * * >t          Q(s#,j#,      .-->,        R(d#,j#,    **->,   information system lacks-and it most probably will-de-
                                                                             tailed semantic information about each named relation, it
 where in each case - - - denotes domains other than s#, d#,                 cannot deduce the redundancies applicable to the named
j#. Let us supposethe following condition C is known to                      set. It might, over a period of time, make attempts to
 hold independent of time: supplier s supplies department                    induce the redundancies, but such attempts would be fal-
 d (relation P ) if and only if supplier s suppliessomeproject               lible.
j (relation Q) to which d is assigned (relation R). Then, we                    Given a collection C of time-varying relations, an as-
 can write the equation                                                      sociated set Z of constraint statements and an instantaneous
                       m(P)     = m(Q)-w(R)                                  value V for C, we shall call the state (C, 2, V) consistent
                                                                             or inconsistent according as V does or does not satisfy 2.
and thereby exhibit a strong redundancy.                                     For example, given stored relations R, S, T together with
   An important reason for the existence of strong re-                       the constraint statement “nlz(T) is a composition of
dundancies in the named set of relationships is user con-                    5~12 with ~12
                                                                                 (R)           (X)“, we may check from time to time that
venience. A particular caseof this is the retention of semi-                 the values stored for R, S, T satisfy this constraint. An al-
obsolete relationships in the named set so that old pro-                     gorithm for making this check would examine the first two
grams that refer to them by name can continue to run cor-                    columns of each of R, S, T (in whatever way they are repre-
rectly. Knowledge of the existence of strong redundancies                    sented in the system) and determine whether
in the named set enables a system or data base adminis-
trator greater freedom in the selection of stored representa-                   (1) ?rl(T) = rl(R),
tions to cope more efficiently with current traffic. If the                     (2)   ?rz(T)   =   7r2@),

strong redundancies in the named set are directly reflected                     (3) for every element pair (a, c) in the relation al2(T)
in strong redundancies in the stored set (or if other strong                        there is an element b such that (a, b) is in a12(R)
redundancies are introduced into the stored set), then, gen-                        and (b, c) is in 7r12(S).
erally speaking, extra storage space and update time are
consumed with a potential drop in query time for some                        There are practica1 problems (which we shall not discuss
queries and in load on the central processing units.                         here) in taking an instantaneous snapshot of a collection
   2.2.2. Weak Redundancy. A second type of redun-                           of relations, some of which may be very large and highly
dancy may exist. In contrast to strong redundancy it is not                  variable.
characterized by an equation. A colIection of relations is                      It is important to note that consistency as defined above
weakly redundant if it contains a relation that has a projec-                is a property of the instantaneous state of a data bank, and
tion which is not derivable from other members but is at                     is independent of how that state came about. Thus, in
all times a projection of somejoin of other projections of                   particular, there is no distinction made on the basis of
relations in the collection.                                                 whether a user generated an inconsistency due to an act of
   We can exhibit a weak redundancy by taking the second                     omission or an act of commission. Examination of a simple
example (cited above) for a strong redundancy, and as-                       IOA binary relation       is complex if neither   it nor its converse is a
suming now that condition C does not hold at all times.                      function.

386        Communications          of the    AMC                                                            Volume    13 / Number     6 / June,   1970
example will show the reasonableness of this (possibly un-            In Section 2 operations on relations and two types of
  conventional)   approach to consistency.                           redundancy are defined and applied to the problem of
     Suppose the named set C includes the relations S, J, D,         maintaining the data in a consistent state. This is bound to
  P, Q, R of the example in Section 2.2 and that P, Q, R             become a serious practical problem as more and more dif-
  possess either the strong or weak redundancies described           ferent types of data are integrated together into common
  therein (in the particular case now under consideration, it        data banks.
  does not matter which kind of redundancy occurs). Further,            Many questions are raised and left unanswered. For
  suppose that at some time t the data bank state is consistent      example, only a few of the more important properties of
  and contains no project j such that supplier 2 supplies            the data sublanguagein Section 1.4 are mentioned. Neither
  project j and j is assigned to department 5. Accordingly,          the purely linguistic details of such a language nor the
  there is no element (2,5) in ~12(P). Now, a user introduces        implementation problems are discussed.Nevertheless, the
  the element (2, 5) into 7~12 by inserting some appropri-
                               (P)                                   material presented should be adequate for experienced
  ate element into P. The data bank state is now inconsistent.       systems programmers to visualize several approaches. It
  The inconsistency could have arisen from an act of omis-           is also hoped that this paper can contribute to greater pre-
 sion, if the input (2, 5) is correct, and there does exist a        cision in work on formatted data systems.
 project j such that supplier 2 supplies j and j is assigned to        Acknowledgment. It was C. T. Davies of IBM Pough-
 department 5. In this case, it is very likely that the user         keepsie who convinced the author of the need for data
 intends in the near future to insert elements into Q and R          independence in future information systems. The author
 which will have the effect of introducing     (2, j) into al2 (Q)   wishes to thank him and also F. P. Palermo, C. P. Wang,
 and (5, j) in W(R).      On the other hand, the input (2, 5)        E. B. Altman, and M. E. Senko of the IBM San JoseRe-
 might have been faulty. It could be the case that the user          search Laboratory for helpful discussions.
 intended to insert some other element into P-an          element
 whose insertion would transform        a consistent state into
 a consistent state. The point is that the system will               RECEIVED     SEPTEMBER,      1969; REVISED    FEBRUARY,      1970
 normally have no way of resolving this question without
 interrogating its environment     (perhaps the user who cre-                                       REFERENCES
 ated the inconsistency ).                                             1. CHILDS, D. L. Feasibility of a set-theoretical           data structure
    There are, of course, several possible ways in which a                   -a general structure based on a reconstituted definition of
 system can detect inconsistencies      and respond to them.                 relation. Proc. IFIP Cong., 1968, North Holland Pub. Co.,
                                                                             Amsterdam, p. 162-172.
 In one approach the system checks for possible inconsist-             2. LEVEIN,     R. E., AND MARON,      M. E. A computer system for
ency whenever an insertion, deletion, or key update occurs.                 inference execution and data retrieval. Comm. ACM                   10,
Naturally, such checking will slow these operations down.                    11 (Nov. 1967), 715-721.
If an inconsistency    has been generated, details are logged         3. BACHMAN,         C. W. Software for random access processing.
internally, and if it is not remedied within some reasonable                Datumation (Apr. 1965), 3641.
                                                                      4. MCGEE,       W. C. Generalized file processing. In Annual             Re-
time interval, either the user or someone responsible for                   view in Automatic        Programming      6, 13, Pergamon Press,
the security and integrity of the data is notified. Another                 New York, 1969, pp. 77-149.
approach is to conduct consistency checking as a batch                5. Information       Management     System/360, Application        Descrip-
operation once a day or less frequently. Inputs causing the                 tion Manual H20-0524-1. IBM Corp., White Plains, N. Y.,
inconsistencies which remain in the data bank state at                      July 1968.
                                                                      6. GIS (Generalized Information System), Application Descrip-
checking time can be tracked down if the system main-                       tion Manual H20-0574. IBM Corp., White Plains, N. Y.,
tains a journal of all state-changing       transactions.    This           1965.
latter approach would certainly be superior if few non-               7. BLEIER,      R. E. Treating hierarchical         data structures in the
transitory inconsistencies occurred.                                        SDC time-shared          data management          system (TDMS).
                                                                            Proc. ACM 22nd Nat. Conf., 1967, MD1 Publications,
   2.4.   SUMMARY                                                           Wayne, Pa., pp. 41-49.
  In Section 1 a relational model of data is proposed as a            8. IDS Reference Manual GE 625/635, GE Inform. Sys. Div.,
basis for protecting users of formatted data systems from                   Pheonix, Ariz., CPB 1093B, Feb. 1968.
the potentially disruptive changes in data representation             9. CHURCH,      A. An Introduction     to Mathematical     Logic I. Prince-
                                                                            ton U. Press, Princeton, N.J., 1956.
causedby growth in the data bank and changes in traffic.             10. FELDMAN,       J. A., AND ROVNER,     P. D. An Algol-based associ-
A normal form for the time-varying collection of relation-                  ative language. Stanford Artificial Intelligence Rep. AI-66,
ships is introduced.                                                        Aug. 1, 1968.




Volume    13 / Number   6 / June,   1970                                                       Communications         of   the   ACM         38’7

Contenu connexe

Tendances

Language Support For Fast And Reliable Message Based Communication In S...
Language  Support For  Fast And  Reliable  Message Based  Communication In  S...Language  Support For  Fast And  Reliable  Message Based  Communication In  S...
Language Support For Fast And Reliable Message Based Communication In S...alanocu
 
03 rdbms session03
03 rdbms session0303 rdbms session03
03 rdbms session03Niit Care
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureHiroshi Ono
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceAIRCC Publishing Corporation
 
Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Gerardo Pardo-Castellote
 
Intelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAIntelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAMartin Pelikan
 
Quality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareQuality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareAngelo Corsaro
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSNDhaya kanthavel
 
A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningAttaporn Ninsuwan
 
Following the user’s interests in mobile context aware recommender systems
Following the user’s interests in mobile context aware recommender systemsFollowing the user’s interests in mobile context aware recommender systems
Following the user’s interests in mobile context aware recommender systemsBouneffouf Djallel
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET Journal
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsVideoguy
 
Data and Computation Interoperability in Internet Services
Data and Computation Interoperability in Internet ServicesData and Computation Interoperability in Internet Services
Data and Computation Interoperability in Internet ServicesSergey Boldyrev
 
Design and implementation of a personal super Computer
Design and implementation of a personal super ComputerDesign and implementation of a personal super Computer
Design and implementation of a personal super Computerijcsit
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Financial Networks and Cartography
Financial Networks and CartographyFinancial Networks and Cartography
Financial Networks and CartographyKimmo Soramaki
 

Tendances (20)

Language Support For Fast And Reliable Message Based Communication In S...
Language  Support For  Fast And  Reliable  Message Based  Communication In  S...Language  Support For  Fast And  Reliable  Message Based  Communication In  S...
Language Support For Fast And Reliable Message Based Communication In S...
 
03 rdbms session03
03 rdbms session0303 rdbms session03
03 rdbms session03
 
A Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network ArchitectureA Scalable, Commodity Data Center Network Architecture
A Scalable, Commodity Data Center Network Architecture
 
Cloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for MapreduceCloak-Reduce Load Balancing Strategy for Mapreduce
Cloak-Reduce Load Balancing Strategy for Mapreduce
 
Rdbms xp 02
Rdbms xp 02Rdbms xp 02
Rdbms xp 02
 
Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)Management High-level overview of the OMG Data Distribution Service (DDS)
Management High-level overview of the OMG Data Distribution Service (DDS)
 
Intelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOAIntelligent Bias of Network Structures in the Hierarchical BOA
Intelligent Bias of Network Structures in the Hierarchical BOA
 
Quality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe MiddlewareQuality of Service in Publish/Subscribe Middleware
Quality of Service in Publish/Subscribe Middleware
 
16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN16 & 2 marks in i unit for PG PAWSN
16 & 2 marks in i unit for PG PAWSN
 
A new study of dss based on neural network and data mining
A new study of dss based on neural network and data miningA new study of dss based on neural network and data mining
A new study of dss based on neural network and data mining
 
E0363638
E0363638E0363638
E0363638
 
Following the user’s interests in mobile context aware recommender systems
Following the user’s interests in mobile context aware recommender systemsFollowing the user’s interests in mobile context aware recommender systems
Following the user’s interests in mobile context aware recommender systems
 
IRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social WebIRJET- Cross System User Modeling and Personalization on the Social Web
IRJET- Cross System User Modeling and Personalization on the Social Web
 
Adidrds
AdidrdsAdidrds
Adidrds
 
Audio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering SystemsAudio/Video Conferencing in Distributed Brokering Systems
Audio/Video Conferencing in Distributed Brokering Systems
 
Www2008 Restws Pautasso Zimmermann Leymann
Www2008 Restws Pautasso Zimmermann LeymannWww2008 Restws Pautasso Zimmermann Leymann
Www2008 Restws Pautasso Zimmermann Leymann
 
Data and Computation Interoperability in Internet Services
Data and Computation Interoperability in Internet ServicesData and Computation Interoperability in Internet Services
Data and Computation Interoperability in Internet Services
 
Design and implementation of a personal super Computer
Design and implementation of a personal super ComputerDesign and implementation of a personal super Computer
Design and implementation of a personal super Computer
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Financial Networks and Cartography
Financial Networks and CartographyFinancial Networks and Cartography
Financial Networks and Cartography
 

Similaire à Birthof Relation Database

Varsha ppt 1
Varsha ppt 1Varsha ppt 1
Varsha ppt 1pks9779
 
Asif nosql
Asif nosqlAsif nosql
Asif nosqlAsif Ali
 
Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...IDES Editor
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banksSammy Alvarez
 
A history and evaluation of system r
A history and evaluation of system rA history and evaluation of system r
A history and evaluation of system rsugeladi
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record LinkagePriya Pandian
 
Advanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxAdvanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxnettletondevon
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)sones GmbH
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...M H
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...IDES Editor
 
Ensuring distributed accountability for data sharing in the cloud
Ensuring distributed accountability for data sharing in the cloudEnsuring distributed accountability for data sharing in the cloud
Ensuring distributed accountability for data sharing in the cloudGowthami Konakanchi
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment>. &lt;
 
Agent based frameworks for distributed association rule mining an analysis
Agent based frameworks for distributed association rule mining  an analysis  Agent based frameworks for distributed association rule mining  an analysis
Agent based frameworks for distributed association rule mining an analysis ijfcstjournal
 
Back to Basics - Firmware in NFV security
Back to Basics - Firmware in NFV securityBack to Basics - Firmware in NFV security
Back to Basics - Firmware in NFV securityLilminow
 

Similaire à Birthof Relation Database (20)

Varsha ppt 1
Varsha ppt 1Varsha ppt 1
Varsha ppt 1
 
Asif nosql
Asif nosqlAsif nosql
Asif nosql
 
Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...Data oriented and Process oriented Strategies for Legacy Information Systems ...
Data oriented and Process oriented Strategies for Legacy Information Systems ...
 
A relational model of data for large shared data banks
A relational model of data for large shared data banksA relational model of data for large shared data banks
A relational model of data for large shared data banks
 
A history and evaluation of system r
A history and evaluation of system rA history and evaluation of system r
A history and evaluation of system r
 
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 
online Record Linkage
online Record Linkageonline Record Linkage
online Record Linkage
 
Advanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docxAdvanced Database Systems CS352Unit 2 Individual Project.docx
Advanced Database Systems CS352Unit 2 Individual Project.docx
 
Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)Whitepaper sones GraphDB (eng)
Whitepaper sones GraphDB (eng)
 
oracle intro
oracle introoracle intro
oracle intro
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
 
Info fabric
Info fabricInfo fabric
Info fabric
 
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
Synthesis of Non-Replicated Dynamic Fragment Allocation Algorithm in Distribu...
 
Ensuring distributed accountability for data sharing in the cloud
Ensuring distributed accountability for data sharing in the cloudEnsuring distributed accountability for data sharing in the cloud
Ensuring distributed accountability for data sharing in the cloud
 
Chapter 2 database environment
Chapter 2 database environmentChapter 2 database environment
Chapter 2 database environment
 
Types of databases
Types of databases   Types of databases
Types of databases
 
Computer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review ExampleComputer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review Example
 
Computer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review ExampleComputer Science Dissertation Literature Review Example
Computer Science Dissertation Literature Review Example
 
Agent based frameworks for distributed association rule mining an analysis
Agent based frameworks for distributed association rule mining  an analysis  Agent based frameworks for distributed association rule mining  an analysis
Agent based frameworks for distributed association rule mining an analysis
 
Back to Basics - Firmware in NFV security
Back to Basics - Firmware in NFV securityBack to Basics - Firmware in NFV security
Back to Basics - Firmware in NFV security
 

Dernier

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Birthof Relation Database

  • 1. Information Retrieval P. BAXENDALE, Editor A Relational Model of Data for The relational view (or model) of data described in Section 1 appears to be superior in several respects to the Large Shared Data Banks graph or network model [3,4] presently in vogue for non- inferential systems. It provides a means of describing data with its natural structure only-that is, without superim- E. F. CODD posing any additional structure for machine representation IBM Research Laboratory, San Jose, California purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence be- tween programs on the one hand and machine representa- Future users of large data banks must be protected from tion and organization of data on the other. having to know how the data is organized in the machine (the A further advantage of the relational view is that it internal representation). A prompting service which supplies forms a sound basis for treating derivability, redundancy, such information is not a satisfactory solution. Activities of users and consistency of relations-these are discussedin Section at terminals and most application programs should remain 2. The network model, on the other hand, has spawned a unaffected when the internal representation of data is changed number of confusions, not the least of which is mistaking and even when some aspects of the external representation the derivation of connections for the derivation of rela- are changed. Changes in data representation will often be tions (seeremarks in Section 2 on the “connection trap”). needed as a result of changes in query, update, and report Finally, the relational view permits a clearer evaluation traffic and natural growth in the types of stored information. of the scope and logical limitations of present formatted Existing noninferential, formatted data systems provide users data systems, and also the relative merits (from a logical with tree-structured files or slightly more general network standpoint) of competing representations of data within a models of the data. In Section 1, inadequacies of these models single system. Examples of this clearer perspective are are discussed. A model based on n-ary relations, a normal cited in various parts of this paper. Implementations of form for data base relations, and the concept of a universal systems to support the relational model are not discussed. data sublanguage are introduced. In Section 2, certain opera- 1.2. DATA DEPENDENCIES PRESENTSYSTEMS IN tions on relations (other than logical inference) are discussed The provision of data description tables in recently de- and applied to the problems of redundancy and consistency veloped information systems represents a major advance in the user’s model. toward the goal of data independence [5,6,7]. Such tables KEY WORDS AND PHRASES: data bank, data base, data structure, data facilitate changing certain characteristics of the data repre- organization, hierarchies of data, networks of data, relations, derivability, sentation stored in a data bank. However, the variety of redundancy, consistency, composition, join, retrieval language, predicate calculus, security, data integrity data representation characteristics which can be changed CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need 1. Relational Model and Normal Form to be removed are: ordering dependence, indexing depend- ence, and accesspath dependence. In some systems these 1.I. INTR~xJ~TI~N dependencies are not clearly separable from one another. This paper is concerned with the application of ele- 1.2.1. Ordering Dependence. Elements of data in a mentary relation theory to systems which provide shared data bank may be stored in a variety of ways, someinvolv- access large banks of formatted data. Except for a paper to ing no concern for ordering, some permitting each element by Childs [l], the principal application of relations to data to participate in one ordering only, others permitting each systems has been to deductive question-answering systems. element to participate in several orderings. Let us consider Levein and Maron [2] provide numerous referencesto work those existing systems which either require or permit data in this area. elements to be stored in at least one total ordering which is In contrast, the problems treated here are those of data closely associated with the hardware-determined ordering independence-the independence of application programs of addresses.For example, the records of a file concerning and terminal activities from growth in data types and parts might be stored in ascending order by part serial changesin data representation-and certain kinds of data number. Such systems normally permit application pro- inconsistency which are expected to become troublesome grams to assumethat the order of presentation of records even in nondeductive systems. from such a file is identical to (or is a subordering of) the Volume 13 / Number 6 / June, 1970 Communications of the ACM 377
  • 2. stored ordering. Those application programs which take Structure 1. Projects Subordinate to Parts advantage of the stored ordering of a file are likely to fail File Segment Fields to operate correctly if for some reason it becomes necessary F PART part # to replace that ordering by a different one. Similar remarks part name part description hold for a stored ordering implemented by means of quantity-on-hand pointers. quantity-on-order It is unnecessary to single out any system as an example, PROJECT project # because all the well-known information systems that are project name marketed today fail to make a clear distinction between project description quantity committed order of presentation on the one hand and stored ordering on the other. Significant implementation problems must be solved to provide this kind of independence. Structure 2. Parts Subordinate to Projects 1.2.2. Indexing Dependence. In the context of for- File Sqmeut Fields matted data, an index is usually thought of as a purely F PROJECT project # performance-oriented component of the data representa- project name tion. It tends to improve response to queries and updates project description PART part # and, at the same time, slow down response to insertions part name and deletions. From an informational standpoint, an index part description is a redundant component of the data representation. If a quantity-on-hand system uses indices at all and if it is to perform well in an quantity-on-order environment with changing patterns of activity on the data quantity committed bank, an ability to create and destroy indices from time to time will probably be necessary. The question then arises: Structure 3. Parts and Projects as Peers Can application programs and terminal activities remain Commitment Relationship Subordinate to Projects invariant as indices come and go? File Segment Fields Present formatted data systems take widely different F PART part # approaches to indexing. TDMS [7] unconditionally pro- part name part description vides indexing on all attributes. The presently released quantity-on-hand version of IMS [5] provides the user with a choice for each quantity-on-order file: a choice between no indexing at all (the hierarchic se- G PROJECT project # quential organization) or indexing on the primary key project name only (the hierarchic indexed sequent,ial organization). In project description PART part # neither case is the user’s application logic dependent on the quantity committed existence of the unconditionally provided indices. IDS [8], however, permits the fle designers to select attributes Structure 4. Parts and Projects as Peers to be indexed and to incorporate indices into the file struc- Commitment Relationship Subordinate to Parts ture by means of additional chains. Application programs File Segnren1 Fields taking advantage of the performance benefit of these in- F PART part # dexing chains must refer to those chains by name. Such pro- part description grams do not operate correctly if these chains are later quantity-on-hand removed. quantity-on-order PROJECT project # 1.2.3. Access Path Dependence. Many of the existing quantity committed formatted data systems provide users with tree-structured G PROJECT project # files or slightly more general network models of the data. project name Application programs developed to work with these sys- project description tems tend to be logically impaired if the trees or networks are changed in structure. A simple example follows. Structure 5. Parts, Projects, and Suppose the data bank contains information about parts Commitment Relationship as Peers FCZC .%&-,,ZC,,t Ficlds and projects. For each part, the part number, part name, part description, quantity-on-hand, and quantity-on-order F PART part # part name are recorded. For each project, the project number, project part description name, project description are recorded. Whenever a project quantity-on-hand makes use of a certain part, the quantity of that part com- quantity-on-order mitted to the given project is also recorded. Suppose that G PROJECT project # the system requires the user or file designer to declare or project name project description define the data in terms of tree structures. Then, any one H COMMIT part # of the hierarchical structures may be adopted for the infor- project # mation mentioned above (see Structures l-5). quantity committed 378 Communications of the ACM Volume 13 / Number 6 / June, 1970
  • 3. Now, consider the problem of printing out the part ray which represents an n-ary relation R has the following number, part name, and quantity committed for every part properties : used in the project whose project name is “alpha.” The (1) Each row represents an n-tuple of R. following observations may be made regardless of which (2) The ordering of rows is immaterial. available tree-oriented information system is selected to (3) All rows are distinct. tackle this problem. If a program P is developed for this (4) The ordering of columns is significant-it corre- problem assuming one of the five structures above-that sponds to the ordering S1, Sz , . . . , S, of the do- is, P makes no test to determine which structure is in ef- mains on which R is defined (see, however, remarks fect-then P will fail on at least three of the remaining below on domain-ordered and domain-unordered structures. More specifically, if P succeeds with structure 5, relations ) . it will fail with all the others; if P succeeds with structure 3 (5) The significance of each column is partially con- or 4, it will fail with at least 1,2, and 5; if P succeeds with veyed by labeling it with the name of the corre- 1 or 2, it will fail with at least 3, 4, and 5. The reason is sponding domain. simple in each case. In the absence of a test to determine The example in Figure 1 illustrates a relation of degree which structure is in effect, P fails because an attempt is 4, called supply, which reflects the shipments-in-progress made to exceute a reference to a nonexistent file (available of parts from specified suppliers to specified projects in systems treat this as an error) or no attempt is made to specified quantities. execute a reference to a file containing needed information. supply (supplier part project quantity) The reader who is not convinced should develop sample programs for this simple problem. 1 2 5 17 1 3 5 23 Since, in general, it is not practical to develop applica- 2 3 7 9 tion programs which test for all tree structurings permitted 2 7 5 4 by the system, these programs fail when a change in 4 1 1 12 &ructure becomes necessary. FIG. 1. A relation of degree 4 Systems which provide users with a network model of the data run into similar difficulties. In both the tree and One might ask: If the columns are labeled by the name network cases, the user (or his program) is required to of corresponding domains, why should the ordering of col- exploit a collection of user access paths to the data. It does umns matter? As the example in Figure 2 shows, two col- not matter whether these paths are in close correspondence umns may have identical headings (indicating identical with pointer-defined paths in the stored representation-in domains) but possess distinct meanings with respect to the IDS the correspondence is extremely simple, in TDMS it is relation. The relation depicted is called component. It is a just the opposite. The consequence, regardless of the stored ternary relation, whose first two domains are called part representation, is that terminal activities and programs be- and third domain is called quantity. The meaning of com- come dependent on the continued existence of the user ponent (2, y, z) is that part x is an immediate component access paths. (or subassembly) of part y, and z units of part 5 are needed One solution to this is to adopt the policy that once a to assembleone unit of part y. It is a relation which plays user access path is defined it will not be made obsolete un- a critical role in the parts explosion problem. til all application programs using that path have become obsolete. Such a policy is not practical, because the number component (part part quantity) of access paths in the total model for the community of 1 5 9 users of a data bank would eventually become excessively 2 5 7 3 5 2 large. 2 6 12 1.3. A RELATIONAL VIEW OF DATA 3 6 3 The term relation is used here in its accepted mathe- 4 7 1 6 7 1 matical sense.Given sets X1, S, , . . . , S, (not necessarily distinct), R is a relation on these n sets if it is a set of n- FIG. 2. A relation with-two identical domains tuples each of which has its first element from S1, its second element from Sz , and so on.’ We shall refer to Si as It is a remarkable fact that several existing information the jth domain of R. As defined above, R is said to have systems (chiefly those based on tree-structured files) fail degreen. Relations of degree 1 are often called unary, de- to provide data representations for relations which have gree 2 binary, degree 3 ternary, and degree n n-ary. two or more identical domains. The present version of For expository reasons, we shall frequently make use of IMS/360 [5] is an example of such a system. an array representation of relations, but it must be re- The totality of data in a data bank may be viewed as a membered that this particular representation is not an es- collection of time-varying relations. These relations are of sential part of the relational view being expounded. An ar- assorted degrees. As time progresses,each n-ary relation may be subject to insertion of additional n-tuples, deletion 1 More concisely, R is a subset of the Cartesian product 81 X of existing ones, and alteration of components of any of its sz x *.* x 87%. existing n-tuples. Volume 13 / Number 6 / June, 1970 Communications of the ACM 379
  • 4. In many commercial, governmental, and scientific data names, and part numbers are. We shall call the set of banks, however, some of the relations are of quite high de- values represented at someinstant the active domain at that gree (a degree of 30 is not at all uncommon). Users should instant. not normally be burdened with remembering the domain Normally, one domain (or combination of domains) of a ordering of any relation (for example, the ordering supplier, given relation has values which uniquely identify each ele- then part, then project, then quantity in the relation supply). ment (n-tuple) of that relation. Such a domain (or com- Accordingly, we propose that users deal, not with relations bination) is called a primary key. In the example above, which are domain-ordered, but with relationships which are part number would be a primary key, while part color their domain-unordered counterparts.2 To accomplish this, would not be. A primary key is nonredundant if it is either domains must be uniquely identifiable at least within any a simple domain (not a combination) or a combination given relation, without using position. Thus, where there such that none of the participating simple domains is are two or more identical domains, we require in each case superfluous in uniquely identifying each element. A rela- that the domain name be qualified by a distinctive role tion may possessmore than one nonredundant primary name, which serves to identify the role played by that key. This would be the casein the example if different parts domain in the given relation. For example, in the relation were always given distinct names. Whenever a relation component of Figure 2, the first domain part might be has two or more nonredundant primary keys, one of them qualified by the role name sub, and the second by super, so is arbitrarily selected and called the primary key of that re- that users could deal with the relationship component and lation. its domains-sub.part super.part, quantity-without regard A common requirement is for elements of a relation to to any ordering between these domains. cross-reference other elements of the same relation or ele- To sum up, it is proposed that most usersshould interact ments of a different relation. Keys provide a user-oriented with a relational model of the data consisting of a collection means (but not the only means) of expressing such cross- of time-varying relationships (rather than relations). Each references. We shall call a domain (or domain combma- user need not know more about any relationship than its tion) of relation R a foreign key if it is not the primary key name together with the names of its domains (role quali- of R but its elements are values of the primary key of some fied whenever necessary): Even this information might be relation S (the possibility that S and R are identical is not offered in menu style by the system (subject to security excluded). In the relation supply of Figure 1, the combina- and privacy constraints) upon request by the user. tion of supplier, part, project is the primary key, while each There are usually many alternative ways in which a re- of these three domains taken separately is a foreign key. lational model may be established for a data bank. In In previous work there has been a strong tendency to order to discuss a preferred way (or normal form), we treat the data in a data bank as consisting of two parts, one must first introduce a few additional concepts (active part consisting of entity descriptions (for example, descrip- domain, primary key, foreign key, nonsimple domain) tions of suppliers) and the other part consisting of rela- and establish some links with terminology currently in use tions between the various entities or types of entities (for in information systems programming. In the remainder of example, the supply relation). This distinction is difficult this paper, we shall not bother to distinguish between re- to maintain when one may have foreign keys in any rela- lations and relationships except where it appears advan- tion whatsoever. In the user’s relational model there ap- tageous to be explicit. pears to be no advantage to making such a distinction Consider an example of a data bank which includes rela- (there may be some advantage, however, when one applies tions concerning parts, projects, and suppliers. One rela- relational concepts to machine representations of the user’s tion called part is defined on the following domains: set of relationships). (1) part number So far, we have discussedexamples of relations which are (2) part name defined on simple domains-domains whose elements are (3) part color atomic (nondecomposable) values. Nonatomic values can (4) part weight be discussedwithin the relational framework. Thus, some (5) quantity on hand domains may have relations as elements. These relations (6) quantity on order may, in turn, be defined on nonsimple domains, and so on. and possibly other domains as well. Each of these domains For example, one of the domains on which the relation em- is, in effect, a pool of values, some or all of which may be ployee is defined might be salary history. An element of the represented in the data bank at any instant. While it is salary history domain is a binary relation defined on the do- conceivable that, at some instant, all part colors are pres- main date and the domain salary. The salary history domain ent, it is unlikely that all possible part weights, part is the set of all such binary relations. At any instant of time 2 In mathematical terms, a relationship is an equivalence class of there are as many instances of the salary history relation those relations that are equivalent under permutation of domains in the data bank as there are employees. In contrast, there (see Section 2.1.1). is only one instance of the employeerelation. * Naturally, as with any data put into and retrieved from a com- puter system, the user will normally make far more effective use The terms attribute and repeating group in present data of the data if he is aware of its meaning. base terminology are roughly analogous to simple domain 380 Communications of the ACM Volume 13 / Number 6 / June, 1970
  • 5. and nonsimple domain, respectively. Much of the confusion If normalization as described above is to be applicable, in present terminology is due to failure to distinguish be- the unnormalized collection of relations must satisfy the tween type and instance (as in “record”) and between following conditions : components of a user model of the data on the one hand (1) The graph of interrelationships of the nonsimple and their machine representation counterparts on the domains is a collection of trees. other hand (again, we cite “record” as an example). (2) No primary key has a component domain which is 1.4. NORMAL FORM nonsimple. A relation whose domains are all simple can be repre- The writer knows of no application which would require sented in storage by a two-dimensional column-homo- any relaxation of these conditions. Further operations of a geneous array of the kind discussed above. Some more normalizing kind are possible. These are not discussedin complicated data structure is necessary for a relation with this paper. one or more nonsimple domains. For this reason (and others The simplicity of the array representation which becomes to be cited below) the possibility of eliminating nonsimple feasible when all relations are cast in normal form is not domains appears worth investigating! There is, in fact, a only an advantage for storage purposes but also for com- very simple elimination procedure, which we shall call munication of bulk data between systems which usewidely normalization. different representations of the data. The communication Consider, for example, the collection of relations ex- form would be a suitably compressedversion of the array hibited in Figure 3 (a). Job history and children are non- representation and would have the following advantages: simple domains of the relation employee.Salary history is a (1) It would be devoid of pointers (address-valued or nonsimple domain of the relation job history. The tree in displacement-valued ) . Figure 3 (a) shows just these interrelationships of the non- (2) It would avoid all dependence on hash addressing simple domains. schemes. (3) It would contain no indices or ordering lists. employee If the user’s relational model is set up in normal form, I names of items of data in the data bank can take a simpler form than would otherwise be the case. A general name jobhistory children would take a form such as I salaryhistory R (g).r.d employee (man#, name, birthdate, jobhistory, children) where R is a relational name; g is a generation identifier jobhistory (jobdate, title, salaryhistory) (optional); r is a role name (optional); d is a domain name. salaryhistory (salarydate, salary) Since g is needed only when several generations of a given children (childname, birthyear) relation exist, or are anticipated to exist, and r is needed FIG. 3(a). Unnormalized set only when the relation R has two or more domains named d, the simple form R.d will often be adequate. employee’ (man#, name, birthdate) jobhistory’ (man#, jobdate, title) 1.5. SOME LINGUISTIC ASPECTS salaryhistory’ (man#, jobdate, salarydate, salary) The adoption of a relational model of data, as described children’ (man#, childname, birthyear) above, permits the development of a universal data sub- FIG. 3(b). Normalized set language based on an applied predicate calculus. A first- order predicate calculus s&ices if the collection of relations Normalization proceeds as follows. Starting with the re- is in normal form. Such a language would provide a yard- lation at the top of the tree, take its primary key and ex- stick of linguistic power for all other proposed data Ian- pand each of the immediately subordinate relations by guages, and would itself be a strong candidate for embed- inserting this primary key domain or domain combination. ding (with appropriate syntactic modification) in a variety The primary key of each expanded relation consists of the of host Ianguages (programming, command- or problem- primary key before expansion augmented by the primary oriented). While it is not the purpose of this paper to key copied down from the parent relation. Now, strike out describe such a language in detail, its salient features from the parent relation all nonsimple domains, remove the would be as follows. top node of the tree, and repeat the same sequence of Let us denote the data sublanguage by R and the host operations on each remaining subtree. language by H. R permits the declaration of relations and The result of normalizing the collection of relations in their domains. Each declaration of a relation identifies the Figure 3 (a) is the collection in Figure 3 (b). The primary primary key for that relation. Declared relations are added key of each relation is italicized to show how such keys to the system catalog for use by any members of the user are expanded by the normalization. community who have appropriate authorization. H per- 4 M. E. Sanko of IBM, San Jose, independently recognized the mits supporting declarations which indicate, perhaps less desirability of eliminating nonsimple domains. permanently, how these relations are represented in stor- Volume 13 / Number 6 / June, 1970 Communications of the ACM 381
  • 6. age. R permits the specification for retrieval of any subset 4-ary relation supply of Figure 1, which entails 5 names in of data from the data bank. Action on such a retrieval re- n-ary notation, would be represented in the form quest is subject to security constraints. P (supplier, & (part, R (project, quantity))) The universality of the data sublanguage lies in its descriptive ability (not its computing ability). In a large in nested binary notation and, thus, employ 7 names. data bank each subset of the data has a very large number -4 further disadvantage of this kind of expression is its of possible (and sensible) descriptions, even when we as- asymmetry. Although this asymmetry does not prohibit sume (as we do) that there is only a finite set of function symmetric exploitation, it certainly makes some basesof subroutines to which the system has access for use in interrogation very awkward for the user to express (con- qualifying data for retrieval. Thus, the class of qualification sider, for example, a query for those parts and quantities expressions which can be used in a set specification must related to certain given projects via & and R). have the descriptive power of the class of well-formed 1.6. EXPRESSIBLE, NAMED, AND STORED RELATIONS formulas of an applied predicate calculus. It is well known Associated with a data bank are two collections of rela- that to preserve this descriptive power it is unnecessary to tions: the named set and the expressibleset. The named set express (in whatever syntax is chosen) every formula of is the collection of all those relations that the community of the selected predicate calculus. For example, just those in userscan identify by means of a simple name (or identifier). prenex normal form are adequate [9]. A relation R acquires membership in the named set when a Arithmetic functions may be needed in the qualification suitably authorized user declares R; it loses membership or other parts of retrieval statements. Such functions can when a suitably authorized user cancels the declaration of be defined in H and invoked in R. R. A set so specified may be fetched for query purposes The expressible set is the total collection of relations that only, or it may be held for possible changes. Insertions t,ake can be designated by expressionsin the data language. Such the form of adding new elements to declared relations with- expressionsare constructed from simple names of relations out regard to any ordering that may be present in their in the named set; names of generations, roles and domains; machine representation. Deletions which are effective for logical connectives; the quantifiers of the predicate calcu- the community (as opposed to the individual user or sub- 1~s;~and certain constant relation symbols such as = , > . communities) take the form of removing elements from de- The named set is a subset of the expressible set-usually a clared relations. Some deletions and updates may be trig- very small subset. gered by others, if deletion and update dependencies be- Since some relations in the named set may be time-inde- tween specified relations are declared in R. pendent combinations of others in that set, it is useful to One important effect that the view adopted toward data consider associating with the named set a collection of has on the language used to retrieve it is in the naming of statements that define these time-independent constraints. data elements and sets. Some aspects of this have been dis- We shall postpone further discussion of this until we have cussed in the previous section. With the usual network introduced several operations on relations (seeSection 2). view, users will often be burdened with coining and using One of the major problems confronting the designer of a more relat,ion names than are absolutely necessary, since data system which is to support a relational model for its names are associated with paths (or path types) rather users is that of determining the class of stored representa- than with relations. tions to be supported. Ideally, the variety of permitted Once a user is aware that a certain relation is stored, he data representations should be just adequate to cover the will expect to be able to exploit5 it using any combination spectrum of performance requirements of the total col- of its arguments as “knowns” and the remaining argu- lection of installations. Too great a variety leads to un- ments as “unknowns,” because the information (like necessary overhead in storage and continual reinterpreta- Everest) is there. This is a system feature (missing from tion of descriptions for the structures currently in effect. many current informat.ion systems) which we shall call For any selected class of stored representations the data (logically) symmetric expZoitation of relations. Naturally, system must provide a means of translating user requests symmetry in performance is not to be expected. expressedin the data language of the relational model into To support symmetric exploitation of a single binary re- corresponding-and elhcient-actions on the current lation, two directed paths are needed. For a relation of de- stored representation. For a high level data language this gree n, the number of paths to be named and controlled is presents a challenging design problem. Nevertheless, it is a n factorial. problem which must be solved-as more users obtain con- Again, if a relational view is adopted in which every n- current accessto a large data bank, responsibility for pro- ary relation (n > 2) has to be expressed by the user as a viding efficient response and throughput shifts from the nested expression involving only binary relations (see individual user to the data system. Feldman’s LEAP System [lo], for example) then 2n - 1 names have to be coined instead of only n + 1 with direct 6 Because each relation in a practical data bank is a finite set at every instant of time, the existential and universal quantifiers n-ary notation as described in Section 1.2. For example, the can be expressed in terms of a function that counts the number of 6 Exploiting a relation includes query, update, and delete. elements in any finite set. 382 Communications of the ACM Volume 13 / Number 6 / June, 1970
  • 7. 2. Redundancy and Consistency ternary relation which preserves all of the information in the given relations? 2.1. OPERATIONS ON RELATIONS The example in Figure 5 shows two relations R, S, which Since relations are sets, all of the usual set operations are are joinable without loss of information, while Figure 6 applicable to them. Nevertheless, the result may not be a shows a join of R with S. A binary relation R is joinable relation; for example, the union of a binary relation and a with a binary relation S if there exists a ternary relation U ternary relation is not a relation. such that 7r12 (U) = R and ‘1~23 (U) = S. Any such ternary The operations discussedbelow are specifically for rela- relation is called a join of R with S. If R, S are binary rela- tions. These operations are introduced becauseof their key tions such that ~2(R) = ~1(S), then R is joinable with S. role in deriving relations from other relations. Their One join that always exists in such a case is the natural principal application is in noninferential information sys- join of R with S defined by tems-systems which do not provide logical inference R*S = {(a, b, c):R(a, b) A S(b, c)) services-although their applicability is not necessarily where R (a, b) has the value true if (a, b) is a member of R destroyed when such services are added. and similarly for S(b, c). It is immediate that Most users would not be directly concerned with these operations. Information systems designersand people con- TB(R*S) = R cerned with data bank control should, however, be thor- and oughly familiar with them. 2.1.1. Permutation. A binary relation has an array T33(R*S) = S. representation with two columns. Interchanging these col- Note that the join shown in Figure 6 is the natural join umns yields the converse relation. More generally, if a of R with S from Figure 5. Another join is shown in Figure permutation is applied to the columns of an n-ary relation, 7. the resulting relation is said to be a permutation of the given relation. There are, for example, 4! = 24 permuta- tions of the relation supply in Figure 1, if we include the II31 hPPb/) (project supplier) identity permutation which leaves the ordering of columns 5 1 unchanged. 5 2 Since the user’s relational model consists of a collection 1 4 7 2 of relationships (domain-unordered relations), permuta- tion is not relevant to such a model considered in isolation. FIG. 4. A permuted projection of the relation in Figure 1 It is, however, relevant to the consideration of stored representations of the model. In a system which provides symmetric exploitation of relations, the set of queries R (supplier Part) S (part project) answerable by a stored relation is identical to the set 1 1 1 1 2 1 1 2 answerable by any permutation of that relation. Although 2 2 2 1 it is logically unnecessary to store both a relation and some permutation of it, performance considerations could make FIG. 5. Two joinable relations it advisable. 2.1.2. Projection. Suppose now we select certain col- R*S (supplier part project) umns of a relation (striking out the others) and then re- move from the resulting array any duplication in the rows. 1 1 1 1 1 2 The final array represents a relation which is said to be a 2 1 1 projection of the given relation. 2 1 2 A selection operator ?r is used to obtain any desired 2 2 1 permutation, projection, or combination of the two opera- FIG. 6. The natural join of R with S (from Figure 5) tions. Thus, if L is a list of lc indices7 L = i1, ii, - . - , ik and R is an n-ary relation (n 2 k ), then rrL(R ) is the k-ary relation whose jth column is column ii of R (j = 1,2, * * . ,k) U (supplier part project) except that duplication in resulting rows is removed. Con- 1 1 2 sider the relation supply of Figure 1. A permuted projection 2 1 1 of this relation is exhibited in Figure 4. Note that, in this 2 2 1 particular case, the projection has fewer n-tuples than the FIG. 7. Another join of R with S (from Figure 5) relation from which it is derived. 2.1.3. Join. Suppose we are given two binary rela- tions, which have some domain in common. Under what Inspection of these relations reveals an element (ele- circumstances can we combine these relations to form a ment 1) of the domain part (the domain on which the join 7 When dealing with relationships, we use domain names (role- is to be made) with the property that it possesses more qualified whenever necessary) instead of domain positions. than one relative under R and also under S. It is this ele- Volume 13 / Number 6 / June, 1970 Communications of the ACM 383
  • 8. ment which gives rise to the plurality of joins. Such an ele- y), and T with R (say a), and, furthermore, y must be a ment in the joining domain is called a point of ambiguity relative of x under S, z a relative of y under T, and x a with respect to the joining of R with S. relative of z under R. Note that in Figure 8 the points If either ~1 (R) or S is a function: no point of ambiguity 2 = a; y = d; x = 2 have this property. can occur in joining R with S. In such a case, the natural The natural linear 3-join of three binary relations R, S, join of R with S is the only join of R with S. Note that the T is given by reiterated qualification “of R with S” is necessary, because S might be joinable with R (as well as R with S), and this R*S*T = { (a, b, c, d):R (a, b) A S (b, c) A T (c, d)} join would be an entirely separate consideration. In Figure 5, none of the relations R, 7r21(R), S, ?rzl(S) is a function. where parentheses are not needed on the left-hand side be- Ambiguity in the joining of R with S can sometimes be cause the natural 2-join (*) is associative. To obtain the resolved by means of other relations. Suppose we are given, cyclic counterpart, we introduce the operator y which pro- or can derive from sources independent of R and S, a rela- duces a relation of degree n - 1 from a relation of degree n tion T on the domains project and supplier with the follow- by tying its ends together. Thus, if R is an n-ary relation ing properties : (n 2 2), the tie of R is defined by the equation (1) m(T) = m(S), r(R) = {(a~, a2, .-. , a,-l):R(ul, az, ... , a,-~, a,) A al = a,). (2) m(T) = al(R), We may now represent the natural cyclic S-join of R, S, T (3) T(i s> +~P(R(& P> A S(P,~))~ by the expression (4) R(s, P> -+ 3j(Soj,.i) * T(.A s)), y (R*S*T). (5) S@,.i) + 3s(T(.?, s> A R(s, P>>, then we may form a three-way join of R, S, T; that is, a Extension of the notions of linear and cyclic S-join and ternary relation such that their natural counterparts to the joining of n binary rela- mz(U) = R, 7r23(U) = s, ml(U) = T. tions (where n 2 3) is obvious. A few words may be ap- propriate, however, regarding the joining of relations which Such a join will be called a cyclic 3-join to distinguish it are not necessarily binary. Consider the case of two rela- from a linear S-join which would be a quaternary relation tions R (degree r ), S (degree s) which are to be joined on V such that p of their domains (p < T, p < s). For simplicity, sup- pose these p domains are the last p of the r domains of R, m(V) = R, lr23W) = s, lr34(V) = T. and the first p of the s domains of S. If this were not so, we While it is possiblefor more than one cyclic 3-join to exist could always apply appropriate permutations to make it (seeFigures 8,9, for an example), the circumstances under so. Now, take the Cartesian product of the first r-p do- which this can occur entail much more severe constraints mains of R, and call this new domain A. Take the Car- tesian product of the last p domains of R, and call this B. R (s P) s (P 23 T 0’ s) Take the Cartesian product of the last s-p domains of S 1 a a d d 1 and call this C. 2 a d 2 We can treat R as if it were a binary relation on the 2 b b&Ii e 2 domains A, B. Similarly, we can treat S as if it were a bi- b e e 2 nary relation on the domains B, C. The notions of linear FIG. 8. Binary relations with a plurality of cyclic 3-joins and cyclic S-join are now directly applicable. A similar ap- proach can be taken with the linear and cyclic n-joins of n u b P 8 TJ’ (s P i) relations of assorted degrees. 2.1.4. Composition. The reader is probably familiar 1 a d 1 a d 2 a e 2 a d with the notion of composition applied to functions. We 2 b d 2 a e shall discussa generalization of that concept and apply it 2 b e 2 b d first to binary relations. Our definitions of composition 2 b e and composability are based very directly on the definitions FIG. 9. Two cyclic 3-joins of the relations in Figure 8 of join and joinability given above. Suppose we are given two relations R, S. T is a cam- than those for a plurality of 2-joins. To be specific, the re- position of R with S if there exists a.join U of R with S such lations R, S, T must possesspoints of ambiguity with that T = aI3(U) . Thus, two relations are composable if respect to joining R with S (say point x), S with T (say and only if they are joinable. However, the existence of more than one join of R with S doesnot imply the existence 8 A function is a binary relation, which is one-one or many-one, of more than one composition of R with S. but not one-many. Corresponding to the natural join of R with S is the 384 Communications of the ACM Volume 13 / Number 6 / June, 1970
  • 9. ndural composition9 of R with S defined by 2.1.5. Restriction. A subset of a relation is a relation. One way in which a relation S may act on a relation R to R.S = TH(R*S). generate a subset of R is through the operation restriction Taking the relations R, S from Figure 5, their natural com- of R by S. This operation is a generalization of the restric- position is exhibited in Figure 10 and another composition tion of a function to a subset of its domain, and is defined is exhibited in Figure 11 (derived from the join exhibited as follows. in Figure 7). Let L, M be equal-length lists of indices such that R. S (project supplier) L = iI,&., *** ,ik,M = jI,j2, e-s , jk where k 5 degree of R and k 6 degree of S. Then the L, M restriction of R by 1 1 1 2 S denoted RLIMS is the maximal subset R’ of R such that 2 1 ?rL(R’) = TM(S). FIG. 10. The natural composition of R with S (from Figure 5) The operation is defined only if equality is applicable be- tween elements of ?T+, on the one hand and rjh (S) on (R) T (project supplier) the other for all h = 1, 2, . . . , k. 1 2 The three relations R, S, R’ of Figure 13 satisfy the equa- 2 1 tion R' = Rw~wsS. FIQ. 11. Another composition of R with S (from Figure 5) R (8 P ~3 s (P 8 R' (8 P 3 When two or more joins exist, the number of distinct 1aA a A 1aA compositions may be as few as one or as many as the num- 2 a A c B 2 a A 2 a B b B 2 b B ber of distinct joins. Figure 12 shows an example of two 2bA relations which have several joins but only one composition. 2 b B Note that the ambiguity of point c is lost in composing R FIG. 13. Example of restriction with S, because of unambiguous associationsmade via the points a, b, d, e. We are now in a position to consider various applications R (supplier part) S (part project) of these operations on relations. 1 2.2. REDUNDANCY 1 z ; ; Redundancy in the named set of relations must be dis- 1 c c f tinguished from redundancy in the stored set of representa- 2 is tions. We are primarily concerned here with the former. 2 8 : To begin with, we need a precise notion of derivability for 2 e e f relations. FICA 12. Many joins, only one composition Suppose 0 is a collection of operations on relations and Extension of composition to pairs of relations which are each operation has the property that from its operands it not necessarily binary (and which may be of different de- yields a unique relation (thus natural join is eligible, but grees) follows the same pattern as extension of pairwise join is not ). A relation R is O-derivablefrom a set S of rela- joining to such relations. tions if there exists a sequence of operations from the col- A lack of understanding of relational composition has led lection 0 which, for all time, yields R from members of S. several systems designers into what may be called the The phrase “for all time” is present, becausewe are dealing connection trap. This trap may be described in terms of the with time-varying relations, and our interest is in derivabil- following example. Suppose each supplier description is ity which holds over a significant period of time. For the linked by pointers to the descriptions of each part supplied named set of relationships in noninferential systems, it ap- by that supplier, and each part description is similarly pears that an adequate collection & contains the following linked to the descriptions of each project which uses that operations: projection, natural join, tie, and restriction. part. A conclusion is now drawn which is, in general, er- Permutation is irrelevant and natural composition need roneous: namely that, if all possiblepaths are followed from not be included, becauseit is obtainable by taking a natural a given supplier via the parts he supplies to the projects join and then a projection. For the stored set of representa- using those parts, one will obtain a valid set of all projects tions, an adequate collection e2of operations would include supplied by that supplier. Such a conclusion is correct permutation and additional operations concerned with sub- only in the very special case that the target relation be- setting and merging relations, and ordering and connecting tween projects and suppliers is, in fact, the natural com- their elements. position of the other two relations-and we must normally 2.2.1. Strong Redundancy. A set of relations is strongly add the phrase “for all time,” because this is usually im- redundant if it contains at least one relation that possesses plied in claims concerning path-following techniques. a projection which is derivable from other projections of 0 Other writers tend to ignore compositions other than the na- relations in the set. The following two examples are in- tural one, and accordingly refer to this particular composition as tended to explain why strong redundancy is defined this the composition-see, for example, Kelley’s “General Topology.” way, and to demonstrate its practical use. In the first ex- Volume 13 / Number 6 / June, 1970 Communications of the ACM 385
  • 10. ample the collection of relations consists of just the follow- The relations al2(P), 7r12 7r12 ) are complexlo relations (Q), (R ing relation : with the possibility of points of ambiguity occurring from time to time in the potential joining of any two. Under employee (serial #, name, manager#, managername) these circumstances, none of them is derivable from the with serial# as the primary key and manager# as a foreign other two. However, constraints do exist between them, key. Let us denote the active domain by A,, and suppose since each is a projection of some cyclic join of the three of that them. One of the weak redundancies can be characterized by the statement: for all time, 1r12 is somecomposition (P) A, (munuger#) c A, (serial#) of ~12(Q) with ‘~~1 (R). The composition in question might and be the natural one at some instant and a nonnatural one at another instant. At (managername) C At (name) Generally speaking, weak redundancies are inherent in for all time t. In this case the redundancy is obvious: the the logical needs of the community of users. They are not domain managernameis unnecessary. To see that it is a removable by the system or data base administrator. If strong redundancy as defined above, we observe that they appear at all, they appear in both the named set and m4(employee) = ~12 (empZoyee)&r3(employee). the stored set of representations. In the secondexample the collection of relations includes a 2.3. CONSISTENCY relation S describing suppliers with primary key s#, a re- Whenever the named set of relations is redundant in lation D describing departments with primary key d#, a either sense,we shall associatewith that set a collection of relation J describing projects with primary key j#, and the statements which define all of the redundancies which hold following relations : independent of time between the member relations. If the lJ (s#, d#, - * * >t Q(s#,j#, .-->, R(d#,j#, **->, information system lacks-and it most probably will-de- tailed semantic information about each named relation, it where in each case - - - denotes domains other than s#, d#, cannot deduce the redundancies applicable to the named j#. Let us supposethe following condition C is known to set. It might, over a period of time, make attempts to hold independent of time: supplier s supplies department induce the redundancies, but such attempts would be fal- d (relation P ) if and only if supplier s suppliessomeproject lible. j (relation Q) to which d is assigned (relation R). Then, we Given a collection C of time-varying relations, an as- can write the equation sociated set Z of constraint statements and an instantaneous m(P) = m(Q)-w(R) value V for C, we shall call the state (C, 2, V) consistent or inconsistent according as V does or does not satisfy 2. and thereby exhibit a strong redundancy. For example, given stored relations R, S, T together with An important reason for the existence of strong re- the constraint statement “nlz(T) is a composition of dundancies in the named set of relationships is user con- 5~12 with ~12 (R) (X)“, we may check from time to time that venience. A particular caseof this is the retention of semi- the values stored for R, S, T satisfy this constraint. An al- obsolete relationships in the named set so that old pro- gorithm for making this check would examine the first two grams that refer to them by name can continue to run cor- columns of each of R, S, T (in whatever way they are repre- rectly. Knowledge of the existence of strong redundancies sented in the system) and determine whether in the named set enables a system or data base adminis- trator greater freedom in the selection of stored representa- (1) ?rl(T) = rl(R), tions to cope more efficiently with current traffic. If the (2) ?rz(T) = 7r2@), strong redundancies in the named set are directly reflected (3) for every element pair (a, c) in the relation al2(T) in strong redundancies in the stored set (or if other strong there is an element b such that (a, b) is in a12(R) redundancies are introduced into the stored set), then, gen- and (b, c) is in 7r12(S). erally speaking, extra storage space and update time are consumed with a potential drop in query time for some There are practica1 problems (which we shall not discuss queries and in load on the central processing units. here) in taking an instantaneous snapshot of a collection 2.2.2. Weak Redundancy. A second type of redun- of relations, some of which may be very large and highly dancy may exist. In contrast to strong redundancy it is not variable. characterized by an equation. A colIection of relations is It is important to note that consistency as defined above weakly redundant if it contains a relation that has a projec- is a property of the instantaneous state of a data bank, and tion which is not derivable from other members but is at is independent of how that state came about. Thus, in all times a projection of somejoin of other projections of particular, there is no distinction made on the basis of relations in the collection. whether a user generated an inconsistency due to an act of We can exhibit a weak redundancy by taking the second omission or an act of commission. Examination of a simple example (cited above) for a strong redundancy, and as- IOA binary relation is complex if neither it nor its converse is a suming now that condition C does not hold at all times. function. 386 Communications of the AMC Volume 13 / Number 6 / June, 1970
  • 11. example will show the reasonableness of this (possibly un- In Section 2 operations on relations and two types of conventional) approach to consistency. redundancy are defined and applied to the problem of Suppose the named set C includes the relations S, J, D, maintaining the data in a consistent state. This is bound to P, Q, R of the example in Section 2.2 and that P, Q, R become a serious practical problem as more and more dif- possess either the strong or weak redundancies described ferent types of data are integrated together into common therein (in the particular case now under consideration, it data banks. does not matter which kind of redundancy occurs). Further, Many questions are raised and left unanswered. For suppose that at some time t the data bank state is consistent example, only a few of the more important properties of and contains no project j such that supplier 2 supplies the data sublanguagein Section 1.4 are mentioned. Neither project j and j is assigned to department 5. Accordingly, the purely linguistic details of such a language nor the there is no element (2,5) in ~12(P). Now, a user introduces implementation problems are discussed.Nevertheless, the the element (2, 5) into 7~12 by inserting some appropri- (P) material presented should be adequate for experienced ate element into P. The data bank state is now inconsistent. systems programmers to visualize several approaches. It The inconsistency could have arisen from an act of omis- is also hoped that this paper can contribute to greater pre- sion, if the input (2, 5) is correct, and there does exist a cision in work on formatted data systems. project j such that supplier 2 supplies j and j is assigned to Acknowledgment. It was C. T. Davies of IBM Pough- department 5. In this case, it is very likely that the user keepsie who convinced the author of the need for data intends in the near future to insert elements into Q and R independence in future information systems. The author which will have the effect of introducing (2, j) into al2 (Q) wishes to thank him and also F. P. Palermo, C. P. Wang, and (5, j) in W(R). On the other hand, the input (2, 5) E. B. Altman, and M. E. Senko of the IBM San JoseRe- might have been faulty. It could be the case that the user search Laboratory for helpful discussions. intended to insert some other element into P-an element whose insertion would transform a consistent state into a consistent state. The point is that the system will RECEIVED SEPTEMBER, 1969; REVISED FEBRUARY, 1970 normally have no way of resolving this question without interrogating its environment (perhaps the user who cre- REFERENCES ated the inconsistency ). 1. CHILDS, D. L. Feasibility of a set-theoretical data structure There are, of course, several possible ways in which a -a general structure based on a reconstituted definition of system can detect inconsistencies and respond to them. relation. Proc. IFIP Cong., 1968, North Holland Pub. Co., Amsterdam, p. 162-172. In one approach the system checks for possible inconsist- 2. LEVEIN, R. E., AND MARON, M. E. A computer system for ency whenever an insertion, deletion, or key update occurs. inference execution and data retrieval. Comm. ACM 10, Naturally, such checking will slow these operations down. 11 (Nov. 1967), 715-721. If an inconsistency has been generated, details are logged 3. BACHMAN, C. W. Software for random access processing. internally, and if it is not remedied within some reasonable Datumation (Apr. 1965), 3641. 4. MCGEE, W. C. Generalized file processing. In Annual Re- time interval, either the user or someone responsible for view in Automatic Programming 6, 13, Pergamon Press, the security and integrity of the data is notified. Another New York, 1969, pp. 77-149. approach is to conduct consistency checking as a batch 5. Information Management System/360, Application Descrip- operation once a day or less frequently. Inputs causing the tion Manual H20-0524-1. IBM Corp., White Plains, N. Y., inconsistencies which remain in the data bank state at July 1968. 6. GIS (Generalized Information System), Application Descrip- checking time can be tracked down if the system main- tion Manual H20-0574. IBM Corp., White Plains, N. Y., tains a journal of all state-changing transactions. This 1965. latter approach would certainly be superior if few non- 7. BLEIER, R. E. Treating hierarchical data structures in the transitory inconsistencies occurred. SDC time-shared data management system (TDMS). Proc. ACM 22nd Nat. Conf., 1967, MD1 Publications, 2.4. SUMMARY Wayne, Pa., pp. 41-49. In Section 1 a relational model of data is proposed as a 8. IDS Reference Manual GE 625/635, GE Inform. Sys. Div., basis for protecting users of formatted data systems from Pheonix, Ariz., CPB 1093B, Feb. 1968. the potentially disruptive changes in data representation 9. CHURCH, A. An Introduction to Mathematical Logic I. Prince- ton U. Press, Princeton, N.J., 1956. causedby growth in the data bank and changes in traffic. 10. FELDMAN, J. A., AND ROVNER, P. D. An Algol-based associ- A normal form for the time-varying collection of relation- ative language. Stanford Artificial Intelligence Rep. AI-66, ships is introduced. Aug. 1, 1968. Volume 13 / Number 6 / June, 1970 Communications of the ACM 38’7