SlideShare a Scribd company logo
1 of 10
MICRO ETL FOUNDATION - Ideas and solutions for the Data Warehouse and Business Intelligence projects in Oracle environment - MASSIMO CENCI
What you know
about ETL process is
MICRO ETL FOUNDATION
The ETL process
• The title you have just read, is deliberately provocative. Of course not everything is false. My intention is to
try to see the things from another point of view. Don't take anything for granted, and try to read some
axioms, typical of the world of the Data Warehouse, in a critical way.
• I will try to provide a different view of reality, questioning the individual letters of the ETL paradigm. It is
therefore necessary to investigate in more detail the meaning of the ETL process.
• We can find many definitions of the ETL process. In general, it is an expression that refers to the process of
Extraction, Transformation and Load of input data into a Synthesis System (Data Warehouse, Data Mart ...)
used by the end users.
• This is a very general definition, which does not helps us to understand the work that we face. A simple
design can help you to understand the process.
MICRO ETL FOUNDATION
The ETL process
• The data that are in the structures of the Operational Systems(OLTP), are extracted, transformed and
loaded into the the Data Warehouse structures.
• In recent years, it has also sets another definition of the loading process. Its difference is the inversion of
the "L" with the "T", that is, the implementation of the transformation phase AFTER the extraction phase.
• This trend is related to the need to charge increasingly large amounts of data, and to the ability to treat
this data, using the ETL tools. Many data transformations, maybe performed "on the fly", ie in memory, or
with the help of temporary tables, can be problematic.
• You have less problems to load the data file as it is, in a staging table, and then apply on it, this
transformations.
MICRO ETL FOUNDATION
The ambiguity of the ETL and ELT processes
• As part of my considerations on the Micro ETL Foundation, the ELT approach is more in line with his
philosophy. There must be a close relationship between the input data file and the Staging table. This
ratio must be 1:1 and the flow must be as complete as possible.
• Despite the ELT approach is better, this does not mean that it is the correct one. Of course, on the
Internet you can find various articles and comments relating to the pros and cons of the two approaches.
• In my view, however, the reality is different. The problem is not to decide whether to make the changes
before or after loading. The problem is that both processes need to be revised. This is because, if we look
carefully :
1. The extraction step doesn't exists.
2. It lacks the configuration and acquisition step.
3. It is not convenient the transformation phase
4. It is not clear how to do the loading, and where to do it.
• Thus, although we can continue to speak generically of ETL (or ELT) because it is basically an acronym
universally known for years, we must be aware that the name is misleading in case you want to set a
baseline with the three phases into a project Gantt, with the estimates associated.
• Let us then to justify the 4 previous points.
MICRO ETL FOUNDATION
1 - The extraction step doesn't exists
• Is there a very real extraction activity in charge of the development team of the DWH? I think not. In most
cases, the feeding systems are external systems that reside on mainframe, perhaps with different operating
systems, and different database programming languages.
• The extraction phase of the data, the "E" of the Extract word, is always in charge to the feeding system,
which knows how to produce the flow. The Data Warehouse team must instead deal with two activities.
1. The activity of Acquisition or Transfer, namely the placement and storage of the input data files into
well-defined folders in the DWH server. All this, with a pre-established naming convention.
2. The analysis of the contents of the data file, that is what the feeding system must produce. This, if
we're lucky. Otherwise, because generate new data files costs money, you will have to reuse or
integrate already existing data files.
• The relationship with the external systems, using the transfer of data files, is used by most of the Data
Warehouse projects. The CDC (Change Data Capture) situations are not so frequent, however, and do not
cover the whole loading phase.
• There are also rare cases, in which the DWH team builds the extraction statements and runs them directly,
using a database link.
• This should not be done for safety reasons, for performance reasons (who knows the indexing structure
into the external systems ?) and for reasons of liability (if the data are not loaded, where is the problem?).
• And also for scalability reasons. In times of budget cuts, it is increasingly common for the "IT people" to
change the transactional systems or part of the source systems.
• Having a source configuration which remains stable to which the external systems must adapt it, is
definitely a choice that maintains the stability.
MICRO ETL FOUNDATION
2 - It lacks the configuration and the acquisition step
• The first step to be taken into account (and it is not simple) is the definition phase of the data files and
their configuration in the metadata tables. It will be the feeding system to provide us the definitions using
word documents, excel, pdf or other.
• We must also give a unique identification of the data file, not numeric, valid for all feeding systems. It
'important that the name will be unique.
• If we have a data file of financial operations, let's call it,for example, TMOV. If we have multiple data files,
such as daily, monthly, quarterly, etc, let's call them DTMOV, MTMOV, QTMOV. If we have two systems
that provide the daily financial operations, let's call them XDTMOV, YDTMOV to distinguish them, but we
must have always a unique name as a reference. On it we will build a primary key.
• In this phase, we will have to configure all of the characteristics of the data files, not only their columnar
structure.
MICRO ETL FOUNDATION
3 - It is not convenient the transformation phase
• We now analyze the letter “T”, that is the "Transform" component of the process. My opinion is that we
should not talk about transformation, but of enrichment of the data.
• To transform the data, means make them different from the original one: this has, as a consequence, a
difficulty in the control of the data.
• We must always be able to demonstrate that the data that we have received in input is identical to what
we have loaded into the Data Warehouse. Immediately after the deploy into production, certainly we will
have to answer to several check requests.
• If the original data has been transformed, we will have to spend much time to restore the original data
files (maybe already stored on tapes) and redo the tests. If we preserve the original data and enrich them
with the result of the transformation, we will be able to respond more efficiently and faster. So my
suggestion is:
1. Keep the original data into the Staging Area tables (and, if possible, even after).
2. Do not make changes to the existing data, but add the columns that contain the transformation
result.
3. Enrichment is the right word. I execute the enrichment by transforming or aggregating different
data as consequence of the requirements.
4. Implement the enrichment step not as a staging phase, but as a phase of post-staging, ie only at the
end of the whole loading of the Staging Area. This is because, often, the enrichment involves the use
of data from other staging tables. To avoid implementing any precedence rules or supervision of
arrivals, it is certainly preferable to wait for the completion of the entire staging process.
MICRO ETL FOUNDATION
4 – How and where to load
• The phase of the loading is very generic, since it does not say where to load the data. We should decide
where to load immediately, because this choice will determine which, of the two fundamental approaches
in the field of Data Warehouse, will be adopted.
• Many years have passed, but this choice will continue to divide the international community. Innmon
approach or Kimball approach?
• We want to have a comprehensive architecture of ODS (Operational Data Store) that retains more detail
data and a dimensional architecture for synthesis data, or we prefer to have a single dimensional structure
for both? Everyone can decide according to his own experience, your own timing and your own badget.
• However, regardless of the method used, surely the first structure to be loaded is the Staging Area, which
at first, will welcome the input data files. The Staging Area is a very vast topic. Just some suggestion.
• The loading of the staging tables should be as simple as possible. A single direct insertion, possibly filtered
by some logical structure, from the data file into the final table. Some small "syntactic" transformation can
be done, but it must be of formatting, and not of semantic.
• The loading must always be preceded by the cleaning of the staging table. Do not load into a staging table,
multiple data files (more days, for example) of the same type, that, for some reason, have not been loaded
and they have accumulated. If you can, always process them one at a time.
• If it is necessary, you can aggregate them, by hand or with an automatic mechanism, into a single data file.
Do not forget that we have to perform very accurate control of these flows.
• So, even a trivial control on the congruence between the number of rows loaded and those present in the
data file, it will be much more difficult if the staging table contains the rows of several input streams.
MICRO ETL FOUNDATION
Conclusion
• So in conclusion, keep in mind that, in practice, ETL hides a different acronym, which can be summarized
with: CALEL
1. Configuration
2. Acquisition
3. Load (Staging Area)
4. Enrichment
5. Load (Data Warehouse)
• But, as CALEL is just horrible, we can continue to call it, ETL process. All this we can represent graphically
in this way:
MICRO ETL FOUNDATION
Conclusion

More Related Content

What's hot

Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challengesmark madsen
 
Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0kshanmug2
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLidescitation
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsshanker_uma
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answersSweta Singh
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorialtekslate1
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving dataImran Ali
 
Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0kshanmug2
 
Oracle Insert Statements for DBAs and Developers
Oracle Insert Statements for DBAs and DevelopersOracle Insert Statements for DBAs and Developers
Oracle Insert Statements for DBAs and DevelopersGuatemala User Group
 
[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01AnusAhmad
 
Oracle dba interview questions with answer
Oracle dba interview questions with answerOracle dba interview questions with answer
Oracle dba interview questions with answerupenpriti
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunalEducation
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014BigClasses.com
 
Transaction management and concurrency control
Transaction management and concurrency controlTransaction management and concurrency control
Transaction management and concurrency controlDhani Ahmad
 

What's hot (20)

Hand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and ChallengesHand Coding ETL Scenarios and Challenges
Hand Coding ETL Scenarios and Challenges
 
Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0Day 1 Data Stage Administrator And Director 11.0
Day 1 Data Stage Administrator And Director 11.0
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
An Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETLAn Overview on Data Quality Issues at Data Staging ETL
An Overview on Data Quality Issues at Data Staging ETL
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
Cts informatica interview question answers
Cts informatica interview question answersCts informatica interview question answers
Cts informatica interview question answers
 
RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.RDBMS to NoSQL. An overview.
RDBMS to NoSQL. An overview.
 
Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data WarehousingDatastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
Function points and elements
Function points and elementsFunction points and elements
Function points and elements
 
Less18 moving data
Less18 moving dataLess18 moving data
Less18 moving data
 
Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0Day 2 Data Stage Manager 11.0
Day 2 Data Stage Manager 11.0
 
Oracle Insert Statements for DBAs and Developers
Oracle Insert Statements for DBAs and DevelopersOracle Insert Statements for DBAs and Developers
Oracle Insert Statements for DBAs and Developers
 
Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing Datastage Introduction To Data Warehousing
Datastage Introduction To Data Warehousing
 
[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01[Www.pkbulk.blogspot.com]dbms01
[Www.pkbulk.blogspot.com]dbms01
 
Oracle dba interview questions with answer
Oracle dba interview questions with answerOracle dba interview questions with answer
Oracle dba interview questions with answer
 
Data warehousing labs maunal
Data warehousing labs maunalData warehousing labs maunal
Data warehousing labs maunal
 
Oracle Complete Interview Questions
Oracle Complete Interview QuestionsOracle Complete Interview Questions
Oracle Complete Interview Questions
 
Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014Informatica interview questions and answers|Informatica Faqs 2014
Informatica interview questions and answers|Informatica Faqs 2014
 
Transaction management and concurrency control
Transaction management and concurrency controlTransaction management and concurrency control
Transaction management and concurrency control
 

Similar to Data Warehouse - What you know about etl process is wrong

Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxnikshaikh786
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testingraianup
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...aasifkuchey85
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdfabhaybansal43
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testingmanojpmat
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Massimo Cenci
 
REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...
 REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION... REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...
REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...ijiert bestjournal
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environmentDavid Walker
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsSOMASUNDARAM T
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Materialobieefans
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA cscpconf
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATAcsandit
 

Similar to Data Warehouse - What you know about etl process is wrong (20)

Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Module 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptxModule 1_Data Warehousing Fundamentals.pptx
Module 1_Data Warehousing Fundamentals.pptx
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 6 of Data Warehouse and Business Intelligence - Naming convention tec...
 
Data warehouse-testing
Data warehouse-testingData warehouse-testing
Data warehouse-testing
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...data warehousing need and characteristics. types of data w data warehouse arc...
data warehousing need and characteristics. types of data w data warehouse arc...
 
ETL_Methodology.pptx
ETL_Methodology.pptxETL_Methodology.pptx
ETL_Methodology.pptx
 
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
4_etl_testing_tutorial_till_chapter3-merged-compressed.pdf
 
Data Ware House Testing
Data Ware House TestingData Ware House Testing
Data Ware House Testing
 
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
Recipe 14 - Build a Staging Area for an Oracle Data Warehouse (2)
 
REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...
 REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION... REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...
REAL-TIME CHANGE DATA CAPTURE USING STAGING TABLES AND DELTA VIEW GENERATION...
 
Data warehousing change in a challenging environment
Data warehousing change in a challenging environmentData warehousing change in a challenging environment
Data warehousing change in a challenging environment
 
ETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse FundamentalsETL Process & Data Warehouse Fundamentals
ETL Process & Data Warehouse Fundamentals
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
Informatica and datawarehouse Material
Informatica and datawarehouse MaterialInformatica and datawarehouse Material
Informatica and datawarehouse Material
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATANEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
NEAR-REAL-TIME PARALLEL ETL+Q FOR AUTOMATIC SCALABILITY IN BIGDATA
 

More from Massimo Cenci

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Massimo Cenci
 
Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaMassimo Cenci
 
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Massimo Cenci
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlMassimo Cenci
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Massimo Cenci
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniMassimo Cenci
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmerMassimo Cenci
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Massimo Cenci
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Massimo Cenci
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Massimo Cenci
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...Massimo Cenci
 
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Massimo Cenci
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Massimo Cenci
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlMassimo Cenci
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Massimo Cenci
 

More from Massimo Cenci (20)

Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
Recipe 16 of Data Warehouse and Business Intelligence - The monitoring of the...
 
Il controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging areaIl controllo temporale dei data file in staging area
Il controllo temporale dei data file in staging area
 
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
Recipe 14 of Data Warehouse and Business Intelligence - Build a Staging Area ...
 
Tecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etlTecniche di progettazione della staging area in un processo etl
Tecniche di progettazione della staging area in un processo etl
 
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
Note di Data Warehouse e Business Intelligence - Il giorno di riferimento dei...
 
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
Recipe 12 of Data Warehouse and Business Intelligence - How to identify and c...
 
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"Note di Data Warehouse e Business Intelligence - Pensare "Agile"
Note di Data Warehouse e Business Intelligence - Pensare "Agile"
 
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioniNote di Data Warehouse e Business Intelligence - La gestione delle descrizioni
Note di Data Warehouse e Business Intelligence - La gestione delle descrizioni
 
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
Recipes 10 of Data Warehouse and Business Intelligence - The descriptions man...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Letter to a programmer
Letter to a programmerLetter to a programmer
Letter to a programmer
 
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
Recipes 9 of Data Warehouse and Business Intelligence - Techniques to control...
 
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
Note di Data Warehouse e Business Intelligence - Tecniche di Naming Conventio...
 
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
Recipes 8 of Data Warehouse and Business Intelligence - Naming convention tec...
 
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
ata Warehouse and Business Intelligence - Recipe 7 - A messaging system for O...
 
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
Data Warehouse and Business Intelligence - Recipe 7 - A messaging system for ...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
Data Warehouse e Business Intelligence in ambiente Oracle - Il sistema di mes...
 
Oracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sqlOracle All-in-One - how to send mail with attach using oracle pl/sql
Oracle All-in-One - how to send mail with attach using oracle pl/sql
 
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
Note di Data Warehouse e Business Intelligence - Le Dimensioni di analisi (pa...
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Data Warehouse - What you know about etl process is wrong

  • 1. MICRO ETL FOUNDATION - Ideas and solutions for the Data Warehouse and Business Intelligence projects in Oracle environment - MASSIMO CENCI What you know about ETL process is
  • 2. MICRO ETL FOUNDATION The ETL process • The title you have just read, is deliberately provocative. Of course not everything is false. My intention is to try to see the things from another point of view. Don't take anything for granted, and try to read some axioms, typical of the world of the Data Warehouse, in a critical way. • I will try to provide a different view of reality, questioning the individual letters of the ETL paradigm. It is therefore necessary to investigate in more detail the meaning of the ETL process. • We can find many definitions of the ETL process. In general, it is an expression that refers to the process of Extraction, Transformation and Load of input data into a Synthesis System (Data Warehouse, Data Mart ...) used by the end users. • This is a very general definition, which does not helps us to understand the work that we face. A simple design can help you to understand the process.
  • 3. MICRO ETL FOUNDATION The ETL process • The data that are in the structures of the Operational Systems(OLTP), are extracted, transformed and loaded into the the Data Warehouse structures. • In recent years, it has also sets another definition of the loading process. Its difference is the inversion of the "L" with the "T", that is, the implementation of the transformation phase AFTER the extraction phase. • This trend is related to the need to charge increasingly large amounts of data, and to the ability to treat this data, using the ETL tools. Many data transformations, maybe performed "on the fly", ie in memory, or with the help of temporary tables, can be problematic. • You have less problems to load the data file as it is, in a staging table, and then apply on it, this transformations.
  • 4. MICRO ETL FOUNDATION The ambiguity of the ETL and ELT processes • As part of my considerations on the Micro ETL Foundation, the ELT approach is more in line with his philosophy. There must be a close relationship between the input data file and the Staging table. This ratio must be 1:1 and the flow must be as complete as possible. • Despite the ELT approach is better, this does not mean that it is the correct one. Of course, on the Internet you can find various articles and comments relating to the pros and cons of the two approaches. • In my view, however, the reality is different. The problem is not to decide whether to make the changes before or after loading. The problem is that both processes need to be revised. This is because, if we look carefully : 1. The extraction step doesn't exists. 2. It lacks the configuration and acquisition step. 3. It is not convenient the transformation phase 4. It is not clear how to do the loading, and where to do it. • Thus, although we can continue to speak generically of ETL (or ELT) because it is basically an acronym universally known for years, we must be aware that the name is misleading in case you want to set a baseline with the three phases into a project Gantt, with the estimates associated. • Let us then to justify the 4 previous points.
  • 5. MICRO ETL FOUNDATION 1 - The extraction step doesn't exists • Is there a very real extraction activity in charge of the development team of the DWH? I think not. In most cases, the feeding systems are external systems that reside on mainframe, perhaps with different operating systems, and different database programming languages. • The extraction phase of the data, the "E" of the Extract word, is always in charge to the feeding system, which knows how to produce the flow. The Data Warehouse team must instead deal with two activities. 1. The activity of Acquisition or Transfer, namely the placement and storage of the input data files into well-defined folders in the DWH server. All this, with a pre-established naming convention. 2. The analysis of the contents of the data file, that is what the feeding system must produce. This, if we're lucky. Otherwise, because generate new data files costs money, you will have to reuse or integrate already existing data files. • The relationship with the external systems, using the transfer of data files, is used by most of the Data Warehouse projects. The CDC (Change Data Capture) situations are not so frequent, however, and do not cover the whole loading phase. • There are also rare cases, in which the DWH team builds the extraction statements and runs them directly, using a database link. • This should not be done for safety reasons, for performance reasons (who knows the indexing structure into the external systems ?) and for reasons of liability (if the data are not loaded, where is the problem?). • And also for scalability reasons. In times of budget cuts, it is increasingly common for the "IT people" to change the transactional systems or part of the source systems. • Having a source configuration which remains stable to which the external systems must adapt it, is definitely a choice that maintains the stability.
  • 6. MICRO ETL FOUNDATION 2 - It lacks the configuration and the acquisition step • The first step to be taken into account (and it is not simple) is the definition phase of the data files and their configuration in the metadata tables. It will be the feeding system to provide us the definitions using word documents, excel, pdf or other. • We must also give a unique identification of the data file, not numeric, valid for all feeding systems. It 'important that the name will be unique. • If we have a data file of financial operations, let's call it,for example, TMOV. If we have multiple data files, such as daily, monthly, quarterly, etc, let's call them DTMOV, MTMOV, QTMOV. If we have two systems that provide the daily financial operations, let's call them XDTMOV, YDTMOV to distinguish them, but we must have always a unique name as a reference. On it we will build a primary key. • In this phase, we will have to configure all of the characteristics of the data files, not only their columnar structure.
  • 7. MICRO ETL FOUNDATION 3 - It is not convenient the transformation phase • We now analyze the letter “T”, that is the "Transform" component of the process. My opinion is that we should not talk about transformation, but of enrichment of the data. • To transform the data, means make them different from the original one: this has, as a consequence, a difficulty in the control of the data. • We must always be able to demonstrate that the data that we have received in input is identical to what we have loaded into the Data Warehouse. Immediately after the deploy into production, certainly we will have to answer to several check requests. • If the original data has been transformed, we will have to spend much time to restore the original data files (maybe already stored on tapes) and redo the tests. If we preserve the original data and enrich them with the result of the transformation, we will be able to respond more efficiently and faster. So my suggestion is: 1. Keep the original data into the Staging Area tables (and, if possible, even after). 2. Do not make changes to the existing data, but add the columns that contain the transformation result. 3. Enrichment is the right word. I execute the enrichment by transforming or aggregating different data as consequence of the requirements. 4. Implement the enrichment step not as a staging phase, but as a phase of post-staging, ie only at the end of the whole loading of the Staging Area. This is because, often, the enrichment involves the use of data from other staging tables. To avoid implementing any precedence rules or supervision of arrivals, it is certainly preferable to wait for the completion of the entire staging process.
  • 8. MICRO ETL FOUNDATION 4 – How and where to load • The phase of the loading is very generic, since it does not say where to load the data. We should decide where to load immediately, because this choice will determine which, of the two fundamental approaches in the field of Data Warehouse, will be adopted. • Many years have passed, but this choice will continue to divide the international community. Innmon approach or Kimball approach? • We want to have a comprehensive architecture of ODS (Operational Data Store) that retains more detail data and a dimensional architecture for synthesis data, or we prefer to have a single dimensional structure for both? Everyone can decide according to his own experience, your own timing and your own badget. • However, regardless of the method used, surely the first structure to be loaded is the Staging Area, which at first, will welcome the input data files. The Staging Area is a very vast topic. Just some suggestion. • The loading of the staging tables should be as simple as possible. A single direct insertion, possibly filtered by some logical structure, from the data file into the final table. Some small "syntactic" transformation can be done, but it must be of formatting, and not of semantic. • The loading must always be preceded by the cleaning of the staging table. Do not load into a staging table, multiple data files (more days, for example) of the same type, that, for some reason, have not been loaded and they have accumulated. If you can, always process them one at a time. • If it is necessary, you can aggregate them, by hand or with an automatic mechanism, into a single data file. Do not forget that we have to perform very accurate control of these flows. • So, even a trivial control on the congruence between the number of rows loaded and those present in the data file, it will be much more difficult if the staging table contains the rows of several input streams.
  • 9. MICRO ETL FOUNDATION Conclusion • So in conclusion, keep in mind that, in practice, ETL hides a different acronym, which can be summarized with: CALEL 1. Configuration 2. Acquisition 3. Load (Staging Area) 4. Enrichment 5. Load (Data Warehouse) • But, as CALEL is just horrible, we can continue to call it, ETL process. All this we can represent graphically in this way: