Slides are created to demonstrate about ETL Testing, some one who want to start and learn ETL Tesing can make use of this ppt. It includes contents related all ETL Testing schema
2. Agenda
◎Datawarehouse Architecture
◎What is ETL?
◎Why ETL is a separate Testing Type?
◎Discuss some ETL Jargons
◎ETL Loading Strategies
◎ETL Testing Types
◎Preparing Test Data for ETL Testing
◎ETL Testing Challenges
◎Best Practices on ETL Testing
◎Demo Example
2
4. ETL – Extract, Transformation and Load
◎ Data is taken (extracted) from a source system,
converted (transformed) into a format that can be
analyzed, and stored (loaded) into a data warehouse or
other system
4
5. ETL - Separate Testing Type?
◎Validation of Data Migration (End – to – End)
○ Source to Target record count match
○ Source to Target data match
○ Transformation of Data
○ Loading Techniques – Full, Incremental
◎Comparison – Current (Legacy) vs Future system
○ Reports / Data comparison
○ Loading time
5
6. Contd..
◎Validation of Business use cases
○ Transformation of data in different format for downstream
systems
○ File Transfer
6
7. ETL Jargons
◎File Systems
○ Structured - clearly defined data types
(CSV, Database, Tab-separated, etc..)
○ Unstructured - not as easily searchable
(Email, Web-pages, videos, etc..)
◎Dimensions
○ Descriptive attributes that are textual fields
○ Dimensions like people, products, place and time
7
8. Contd..
◎Facts
○ Consists of business facts and foreign keys that refer to
primary keys in the dimension tables provide the
measurement of an enterprise
8
9. Contd..
◎Staging Layer
○ Staging area is a place where you hold temporary tables
on data warehouse server
◎Look-up
○ Reference tables – used to fetch the matching values
○ Target tables – used to find the delta records or perform
incremental load
9
10. ETL Loading Strategies
◎Full Load – Truncate and Load
○ Truncating the target table before loading new data (Staging
Area)
◎Incremental Load
○ Incremental load is a process of loading data incrementally
○ Only new and changed data is loaded to the destination
○ Used to keep historical data
○ Uses Timestamps, Flags, Business key to fetch delta records
10
11. SCD types
◎A Slowly Changing Dimension (SCD) is a dimension
that stores and manages both current and historical
data over time in a data warehouse.
◎It is considered and implemented as one of the most
critical ETL tasks in tracking the history of dimension
records
11
12. Contd..
◎Type 0 SCDs– Fixed Dimension
○ No changes allowed, dimension never changes
◎Type 1 SCDs – Overwriting
○ Existing data is lost as it is not stored anywhere else
○ Default type of dimension you create
◎Type 2 SCDs - Creating another dimension record
○ When the value of a chosen attribute changes, the current record is
closed. A new record is created -becomes the current record
○ Each record contains the effective time and expiration time
12
13. ETL Testing Types
◎Production Validation Testing
○ Table balancing or product reconciliation. It is performed on
data before or while being moved into the production system in
the correct order.
◎Source To Target Testing
○ Performed to validate the data values after data transformation.
◎Application Upgrade
○ Check data extracted from an older application or repository are
exactly same as the data in a repository or new application.
13
14. Contd..
◎Data Transformation Testing:
○ Multiple SQL queries are required to be run for each and
every row to verify data transformation standards.
◎Data Completeness Testing:
○ Verify if the expected data is loaded at the appropriate
destination as per the predefined standards.
14
15. Preparing Test Data
◎Can be Generated
○ Manually
○ Mass copy of data from production to testing environment
○ Mass copy of test data from legacy client systems
○ Automated Test Data Generation Tools
◎How to select data for testing
○ Data profiling
○ Full field length data
○ Null records
○ Lookup values
15
16. ETL Testing Challenges
◎ Testers have no privileges to execute ETL jobs by their own
◎ Volume and complexity of data are very huge
◎ Incompatible and duplicate data
◎ Loss of data during ETL process
◎ Fault in business process and procedures
◎ Trouble acquiring and building test data
◎ Unstable testing environment
◎ Missing business flow information
16
17. Best Practices
◎Make sure data is transformed correctly
◎Without any data loss and truncation projected data
should be loaded into the data warehouse
◎Ensure that ETL application appropriately rejects and
replaces with default values and reports invalid data
◎Ensure appropriate load occurs at each data layer
17
18. Contd..
◎Need to ensure that the data loaded in data
warehouse within prescribed and expected time
frames to confirm scalability and performance
◎Ensure records are updated as per appropriate
Business Key in the target database tables
◎Ensure coding standards are in place while designing
ETL mappings
18