The naming convention is a key component of any IT project.
The purpose of this article is to suggest a standard for a practical and effective Data Warehouse design in Oracle environment
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention techniques (part2)
1. Recipes of Data Warehouse and
Business Intelligence
Naming convention techniques
(part 2)
2. Introduction (1)
• In the part 1, the application of the naming convention techniques, had the tables as its privileged
object , certainly the basic entities of an information system. We have defined a name to these
entities in their easier form (table), in their aggregate form (materialized view or summary table)
in their logical form (view).
• It was emphasized that these techniques can be applied to any logical/physical entity of a Data
Warehouse. So, I wish to complete these thoughts having in mind three targets:
– Completeness: The tables are basic, but alone, do not constitute a Data Warehouse. There
must be access rights to view them, programs must exist to load them ,must exist indexes to
speed their access, there must be constraints to ensure data integrity. Even programs, rights,
indexes and constraints must be created by respecting the naming convention. The tables are
made of attributes. Even the attributes have a name. We will speak.
– Pragmatism: Only seeing apply the techniques described in a real case, we can recognize the
utility, then we will examine and we will give a name to all the other entities in the game,
using the sample Data Warehouse.
– Knowledge: Some of the entities that will be subject to naming, are specific to Oracle and
this is a good opportunity to give them a brief description.
• The choices made for the naming convention are only guidelines. They are not a dogma. The
convention can be discussed and changed according to our needs and to our particular view of
the system.
3. Introduction (2)
• The main objective was to put attention to the importance and usefulness of the Naming
Convention.
• Another point I wish to emphasize is that the convention is "Database Administator oriented"
and not "Business oriented”. It means that the names chosen, for example, for the tables, will be
physical ones, and the names will be those that only the DBA sees. The "rest of the world" should
not see those names, but the "logical" names that are filtered by synonyms and/or views.
4. The users of the Naming Convention
• Based on some useful questions received, I want to clarify this point. Take for example the
EDW_COM_CDI_CUST_DIT entity. This entity represents the customers (CUST) of the dimension
table (DIT) of the conformed dimensions section (CDI) of the common area (COM) for all entities,
of our Data Warehouse (EDW). The content of this entity is clear to any DBA, also to who, for
example, inherits the management of a Data Warehouse that does not know. (try to think if the
table had been named A01DWCST).
• The EDW_COM_CDI_CUST_DIT entity is seen and handled only by the DBA. In my view, the only
other users who can see the entity (and only what we need, that are, usually, facts and
dimensions) are the business-area builders, by means of an administration module, that is part of
the front-end tool (eg Oracle Business Intelligence).
• These users do not need to see the EDW_COM_CDI_CUST_DIT physical name, but a
view/synonym (logical name) as, for example, CUSTOMER_DIT. If we had the foresight to make
unique the last two components of the name, the rest of the world will see an entity name much
shorter, simple and near to its business logic.
5. The users of the Naming Convention
A01DWCST EDW_COM_CDI_CUST_DIT
Without Naming Convention With Naming Convention
CUSTOMER_DIT
DBA
Architect
6. The Naming Convention of the table attributes
• As we know from the theory of relational databases, the table attributes are a set of specific
characteristics of the various entities that define the logical model. Since we have defined a
Naming Convention for the entity, it is necessary define a Naming Convention for attributes.
• The paradigm that underlies the Naming Convention of the table attributes can be summarized in
the following formula:
<attribute name> = <logical name>_<type code>
• For them, the name is very simplified because their logical context is already structurally defined
by the table to which they belong.
• Into the data dictionary tables of an RDBMS, such as Oracle, you can locate all the attributes and
their tables associated. So an effective Naming Convention will be very useful in the research of all
the attributes with certain common characteristics. Here are some examples from my personal
experiences.
• In a Data Warehouse for a bank, was born the need to change the size of the currency numbers
fields from two to six decimal places. The need was clearly linked to rounding problems. Hundreds
of tables with different columns had to be involved in the modification. Have adopted the Naming
Convention to identify all the columns of currency amounts with “*_AMT” was decisive. Has
allowed us to generate a script that, accessing to the data dictionary tables , it made dynamically
the change of structure of all and only the affected columns.
7. The Naming Convention of the table attributes (2)
• Some ETL and reporting tools allow us to identify automatically all the descriptive columns of
alphanumeric codes that will be displayed in the output: the interface of the tool will then use a
clause "like" to locate the fields. If you use the standard to name the description of all the code
columns with "*_DSC, this will allow you to take advantage of this feature of the tool, and it will
not need to specify one by one all the fields. Now we see some examples of the <type code>.
• COD - Code - Alphanumeric code: This is the classic code that is associated with a description and
a domain. It may be a customer number, an order type, an account status, etc.. I suggest to deal all
the numeric codes as alphanumeric codes.
• DSC - Description - the code description is always the description associated with the code that is
used in the reporting tools and in front-end. It is a design choice understand if only a single
description is sufficient, or define a short description (SDS which stands for short description) and
a long description (LDS which stands for long description). It often happens that the user requires
the concatenation of the short description with the code (CSD which stands for code plus short
description)
• AMT - Amount - Always indicates an amount.
• QTY - Quantity - Indicates a quantity in pieces, weight or in some other unit of measure.
• KEY - Always indicates an artificial key. A column with this type must exist in all dimension tables
and in the corresponding columns of the fact tables.
8. The Naming Convention of the table attributes (3)
• DTS - Date Stamp - Date: indicates a date in the Oracle format, that is inclusive of the time (hours,
minutes, seconds)
• FLG - Flag - It is always a binary field, ie that it may be only 0 or 1.
• TXT - Text - Field of generic text
• YMD - Day in the YYYYMMDD format
9. The other entities of a Data Warehouse
• Identify all the main entities or structures of a Data Warehouse, is not an easy job without
forgetting that in Oracle there are over 30 different types of structures.
• Eeach RDBMS has its own requirements and peculiarities and would be long-winded and useless
to try to give a Naming Convention at all. So we will focus on the main entities, almost always
present, leaving to the reader the application of the learned techniques for the remaining ones.
Here is the list of entities, subject of our next guidelines.
• Index
• Tablespace
• Datafile
• Integrity Constraint
• Role
• Package
10. The Naming Convention of the indexes
• As the Naming Convention is linked to the type of index, I will give a brief overview of the most
common types of indexes. They typically cover 90% of the need for a Data Warehouse.
• As everyone knows, the indexes are data structures that are created on one or more columns in a
table to optimize the performance of the data access; the goal of an index is therefore to provide
an immediate physical access to the rows of the table that contains the values.
• In Oracle, but there are also in other RDBMS, the indexes most used are the classic B-tree indexes,
the local or global bitmap indexes, and the function index. The paradigm that underlies the
Naming Convention of the indexes can be summarized in the following formula:
<index name> = <project code>_<area code>_<section code>_<logical name>_<index type>
• The Naming Convention of the indexes will then have the same syntax of the entities, but will only
change the tipology. In practice its name is identical to that of the table on which it is created
except for the suffix. What follows is a list of the indexes applicable to a sales fact table. X indicates
a progressive number.
EDW_DM0_SLS_LBx: Represents a local bitmap index.
EDW_DM0_SLS_GBx: Represents a global bitmap index.
EDW_DM0_SLS_NUx: Represents a generic btree index not unique
EDW_DM0_SLS_UIx: Represents a generic index btree unique
EDW_DM0_SLS_FUx: Represents a function index
11. The Naming Convention of integrity constraints
• The Integrity constraints allow us to associate some rules to the Data Warehouse tables, to order
to prevent the introduction of outliers or non-compliant values.
• It is needless to emphasize the importance that these rules have in the design of the system.
Dispelling immediately a myth that often we hear: constraints on tables encumbers the data
manipulation operations. Nothing could be further from the truth. Let's see to make things clear.
1. Is obvious that the introduction of an integrity constraint slows down the processes of
manipulation of the table, but its overhead is minimal and, as a percentage, its weight in the
loading process will be negligible. I remember you, however, that the constraints can be
turned off before the data loading and reactivated immediately after loading.
2. If implemented programmatically in your application, the constraints will never be so
complete, secure and manageable as those defined automatically by the RDBMS.
3. Always enter the integrity constraints, even if the source systems are in turn RDBMS with the
active constraints. Do not to trust is better: try to think of what it means to have discovered
duplicate keys after loading a few months of data and be in production.
4. The integrity constraints are necessary to activate the query rewrite in Oracle, ie its internal
functionality, which is able to rewrite a query based on the fact table, and redirecting it on a
materialized view. Without the integrity constraints between the fact table and its dimension
table this mechanism will never work.
12. The Naming Convention of integrity constraints (2)
• The paradigm that is the basis of the Naming Convention of the integrity constraints can be
summarized in the following formula:
<constr. name> = <project code>_<area code>_<section code>_<logical name>_< constr. type>
• The following is a list of integrity constraints applicable to a fact table of sales. X indicates a
progressive number.
EDW_DM0_SLS_Nxx: To indicate the requirement to have always a non-null value for a field. XX is
a sequential number for each column in the table that requires the constraint.
EDW_DM0_SLS_PK1: To indicate the primary key.
EDW_DM0_SLS_UKx: To specify a unique key.
EDW_DM0_SLS_FKx: To specify the foreign key. If you think that the number of foreign key can
be higher of 9, use the convention Fxx
EDW_DM0_SLS_CKx: To indicate a more complex constraint based on some conditions. (for
example a start date should always be prior of the end date)
13. The Naming Convention of the tablespace
• The tablespaces are logical drives that connect objects with common logical characteristics. Each
table, materialized view or index always has a table space that contains, either expressed explicitly
inside the script of creation, or implied, that is (the default), the tablespace of the user who
created the object.
• In turn each tablespace are associated with one or more datafiles. The paradigm that underlies
the Naming Convention of the tablespace can be summarized in the following formula:
<tbs name> = <project code>_<area code>_<section code>_< tbs type>
where the section code and type code are optional; In fact, the technique to be applied in this
case, is not unique, but depends on the size of the objects that constitute the tablespace.
Referring to our example of the sales , we have the following:
EDW_COM: Tablespace of common entities. In the area that we have defined COM, there are
definitely tables and indexes of little size, compared to data from other areas, so will be sufficient
the project code plus the area code.
EDW_STA: Tablespace for temporary objects. Also in this case, the staging tables, which are only
transient and of small dimensions, may stay into only one tablespace.
14. The Naming Convention of the tablespace (2)
EDW_DM0_SLS: Tablespace objects from the sales data mart. If the total space occupied by these
objects is limited, this may be sufficient only one tablespace. (limited,for me, is under 8 Gb). If the
volumes are higher, it can be used DFT, IFT, DMT and IMT, ie fact table, index fact table , materialized
view and index materialized view.
In cases of VLDW (Very Large Data Warehouse) is conceivable a tablespace for indexes, and a
tablespace for the data, of each table.
15. The Naming Convention of the datafiles
• The next considerations are valid if you are not using the Automatic Storage Management feature
of Oracle.
• As stated in the previous paragraph, the tablespace is made up datafiles. At the time of the
creation of the tablespace, you must already know about, the total space occupied by the objects
that will stay in the tablespace, because you will be asked to allocate physical space.
• Let's forget about "to drive" the location of the data files on some disks of the Database Server.
Now the virtualization techniques of physical space allow us to see a single disc. My advice is to
divide the space occupied by the objects of the tablespace in a number of different files, of size
not too high, for their better management.
• The paradigm that underlies the Naming Convention of the datafiles can be summarized in the
following formula:
<datafile name> = <tablespace name>_XX.<file type>
• In this case, XX is a progressive number, the type 01,02, .., while the file type is usually fixed to DBF
(Data Base File). Of course, instead of DBF you can also associate other acronyms, it is important
that all the datafiles follow the same logic.
16. The Naming Convention of the roles
• In a Data Warehouse, tables and their structures, must be aggregated to be accessible to users for
data selection. I spoke at the beginning of the users of the Data Warehouse. I am aware that often
the reality is more complicated, and there will always be users who access or wish to access the
data directly. For this reason I speak about roles.
• Provide access, means giving the grant to the entities. Because users generally have access to one
or more data marts, the best way to simplify the management of access rights is to group all
accesses to the data mart using roles. (When I speak about Data Mart,that is logical, I intend the
fact table and the related dimension tables).
• So the grant does not associate a user with a structure, but a user with a role. Appears
immediately clear that the Naming Convention of the roles is closely connected to the data marts,
ie with the logical partitioning at the section level . The paradigm that underlies the Naming
Convention of roles can be summarized in the following formula:
<role name> = <project code>_<area code>_<section code>_<type code>
• The type code may be optional, as users of the Data Warehouse will access always with "SELECT"
query (I hope !); this does not mean that we cannot use "_SEL" to indicate the role of read-only
access, and with "_UPD“ the role of insert, update and delete. The next figure shows a summary
of the techniques applied so far.
17. Sales Data Mart (SLS)
Indici
Indici
Sales Fact Table
EDW_DM0_SLS_FAT
Local Bitmap index
EDW_DM0_SLS_LBx
Monthly Sales
EDW_DM0_SLS_MONTH_FMV
Access
Role
To Sales Data Mart
EDW_DM0_SLS_SEL
Constraint
Primary key
EDW_DM0_SLS_MONTH_PKx
Indici
Local Bitmap index
EDW_DM0_SLS_MONTH_LBx
Constraints
foreign key
EDW_DM0_SLS_MONTH_FKx
Constraint
Primary key
EDW_DM0_SLS_PKx
Constraints
foreign key
EDW_DM0_SLS_FKx
Common Area (COM)
Datafile 1
EDW_COM_01.DBF
Enterprise Data Warehouse (EDW)
Indici
Common objects
Datafile 4
EDW_DM0_SLS_04.DBF
Datafile 3
EDW_DM0_SLS_03.DBF
Datafile 2
EDW_DM0_SLS_02.DBF
Datafile 1
EDW_DM0_SLS_01.DBF
Tablespace
EDW_DM0_SLS
Tablespace EDW_COM
18. The Naming Convention of the packages
• Packages are libraries of PL/SQL code. In Oracle, PL/SQL (procedural language sql) is the internal
database language, although you can write programs in Java, C, or other programming languages,
callable from PL/SQL modules.
• These modules may be procedures or functions. In Oracle, to use the package is crucial: I highly
recommend that all modules necessary for the loading process are contained into packages.
• The advantages of their use are numerous, and I will mention only two:
1. Modularity: organize your programs in an orderly manner according to the context in which they
operate is essential for anyone that work, or will work, on the project.
2. Performance: when you call a module of a package for the first once, the entire package is loaded
into memory. Subsequent calls to other modules of the package doesn't require disk access.
• Returning to the Naming Convention, this means that the procedure which is used to load the fact
table of the sales or the aggregate monthly one, must be contained in the package that has the
same name (if possible) of the target table.
• If this procedure uses functions or procedures of the generic Data Mart of the sales, such a
procedure should be contained in the package that has the same name of the corresponding
section. The logical process will continue until reaching the common procedures to the entire Data
Warehouse (for example, a function that returns me the difference of two dates for calculate the
delta).
• Next figure shows an example of such encapsulation.
19. Daily Sales
EDW_DM0_SLSD_FAT
Monthly Sales
EDW_DM0_SLSM_FMV
Loading modules
Daily Sales package
EDW_DM0_SLSD_FAT
Loading modules
Monthly Sales package
EDW_DM0_SLSM_FMV
Common package for the
Data Mart of Sales
EDW_DM0_SLS
Common modules
Common package for all
Data Mart of level 0
EDW_DM0
Common modules
Common package for all
Data Warehouse
EDW
Common modules
Load Load
Call
Call
Call
Call
Call
Call
20. The Naming Convention of the packages (2)
• The Naming Convention to be adopted for the package is very flexible and may be in its most
extensive form:
<pkg name> = <project code>_<area code>_<section code>_<logical name>_<type code>
as well as in its simplest form:
< project code >
• The presence of the logical name and of the type code can be usable in complex systems where
the number of package tends to be very high.
• Do not forget that the type code must give value added to the semantics of the name. Add as
_PKG type code does not create added value, as this information is obtainable from the Oracle
catalog with a simple select statement.
• If you decide that all modules that recall Java procedures in a certain section are within a specific
package, then "_PKG" and "_JPK" will definitely effective choices.
• In the case where, as in Oracle, it is not possible to have the same name for a package and a table,
the use of "_PKG" Will be mandatory.
21. Conclusions
• We have really reached the end of this short journey within the Naming Convention techniques .
What is mentioned, is not certainly exhaustive of the many possible applications of these
techniques.
• Each of us, on the basis of own experience, can partition and can codify according to their needs
and according to your own intuition. Indeed it is not important the choice by which you partition
or codify the system, but it is important follow a method of standardization, in the most rigorous
way.
• An effective Naming Convention certainly provides all the tools necessary to keep under control
soon the system, in terms of knowledge, management and maintenance