SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Webinar
Mike Calabrese
Team Lead/Senior Engineer
Bill Hayduk
Founder/CEO
Creating a Data Validation
& Testing Strategy
Copyright Real-Time Technology Solutions, Inc. 2019 CONFIDENTIAL – DO NOT distribute
Facts
Founded:
1996 (24th anniversary)
Location:
New York City (HQ)
Customer profile:
• Fortune 500 & mid-size
• 700+ customers
Strategic Partners:
IBM, Microsoft, Oracle,
Teradata, Cloudera,
HortonWorks, MongoDB,
SAP, Micro Focus
Other Software
Supported
QuerySurge, Selenium,
Appium, CitraTest,
Postman, Smart Bear,
JMeter, others
RTTS is the premier pure-play QA & Testing firm
that specializes in Test Automation
Data
Validation
Data Testing
Strategies
Intro
Assessment
Case Study
Data Validation Assessment by
Data
Validation
Data Testing
Strategies
Intro
Assessment
Case Study
Data Validation Assessment by RTTS
Handles more than 1 million customer transactions every hour.
• data imported into databases that contain > 2.5 petabytes of data
• the equivalent of 167 times the information contained in all the books in the US Library of
Congress.
Facebook handles 40 billion photos from its user base.
Google processes 1 Terabyte per hour
Twitter processes 85 million tweets per day
eBay processes 80 Terabytes per day
others
Big Impacts of Big Data
Data Warehouse Marketplace
“the worldwide data warehouse management software market is forecast
to generate nearly $17 billion in revenue by 2020” - Forrester
Top vendors: Oracle, Teradata, IBM, Microsoft, SAP, Micro Focus and Amazon
Business Intelligence Marketplace
“The business intelligence (BI) and analytics software market is forecast to grow to
$22.8 billion by the end of 2020” - Gartner
SAP, IBM, SAS, Microsoft, Oracle, Tableau, Qlik, MicroStrategy , Information Builders
DWH, BI, Big Data Marketplaces
Big Data Marketplace
“By the end of 2020, companies will spend > USD $72 billion on on Big Data
hardware, software, & professional services” - IDC
Oracle, IBM, Microsoft, Amazon, Micro Focus, HortonWorks, Cloudera, Teradata,
SAP, MongoDB, MapR, DataStax, Snowflake.
Legacy DB
CRM/ERP
DB
Finance DB
Source Data
ETL Process
Target DWH
ETL Process
Business Intelligence (BI) & Analytics
Data Mart
Impacts of Bad Data
“On average, poor data quality costs organizations $14.2 million
annually.”
a software division ofQuerySurge™
“Dirty data costs the average business 15% to 25% of revenue.”
“Cleaning up data will lead to average cost savings of 33%, while
boosting revenue by an average of 31%.”
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
What is Data Validation?
Data Validation Testing
The process of verifying your data is completely and accurately moved
through your systems according to the business requirements.
Legacy DB
CRM/ERP
DB
Finance DB
Source Data ETL Process Target DWH
Extract
Transform
Load
• Data Completeness
Verifying that all data has been loaded from the sources to the target Data Warehouse.
Validate the correct data displays in BI reports.
Data Validation Testing
• Data Transformation
Ensuring that all data has been transformed
correctly during the extract-transform-load (ETL)
process.
• BI Report Testing
Verify that BI Reports are formatted correctly, calculated fields are validated, and data is verified
against the underlying data.
DATA VALIDATION TEST TYPES
• BI Performance Testing
Ensure your BI Reports can be generated in a reasonable amount of time
• Data Quality
Ensuring that the ETL process correctly rejects,
substitutes default values, corrects or ignores and
reports invalid data.
Finding Bad Data
Issue Description Possible Causes
Missing Data Data that does not make it into the target database
• Invalid or incorrect lookup table in the
transformation logic
• Bad data from the source database (Needs
cleansing)
• Invalid joins
Truncation of Data Data being lost by truncation of the data field
• Invalid field lengths on target database
• Transformation logic not considering field
lengths from source
Data Type Mismatch Data types not set up correctly on target database Source data field not configured correctly
Null Translation
Null source values not being transformed to correct
target values
Development team did not include the null
translation in the transformation logic
Wrong Translation
Opposite of the Null Translation error. Field should be
null but is populated with a non-null value or field
should be populated, but with the wrong value
Development team incorrectly translated the
source field for certain values
Misplaced Data
Source data fields not being transformed to the
correct target data field
Development team inadvertently mapped
the source data field to the wrong target data
field
Extra Records
Records which should not be in the ETL are included
in the ETL
Development team did not include filter in
their code
Not Enough Records
Records which should be in the ETL are included in
the ETL
Development team had a filter in their code
which should not have been there
Finding Bad Data (cont.)
Issue Description Possible Causes
Transformation Logic
Errors/Holes
Testing sometimes can lead to finding “holes” in the
transformation logic or realizing the logic is unclear
Development team did not take into account
special cases. For example international
cities that contain special language specific
characters might need to be dealt with in the
ETL code
Simple/Small Errors Capitalization, spacing and other small errors
Development team did not add an additional
space after a comma for populating the
target field.
Sequence Generator
Ensuring that the sequence number of reports are in
the correct order is very important when processing
follow-up reports or answering to an audit
Development team did not configure the
sequence generator correctly resulting in
records with a duplicate sequence number
Undocumented
Requirements
Find requirements that are “understood” but are not
actually documented anywhere
Several of the members of the development
team did not understand the “understood”
undocumented requirements.
Duplicate Records
Duplicate records are two or more records that
contain the same data
Development team did not add the
appropriate code to filter out duplicate
records
Numeric Field Precision
Numbers that are not formatted to the correct
decimal point or not rounded per specifications
Development team rounded the numbers to
the wrong decimal point
Rejected Rows Data rows that get rejected due to data issues
Development team did not take into account
data conditions that could break the ETL for
a particular row
Challenges
• How much data needs to be validated/tested?
• How do I ensure I am testing the proper data
permutations?
• What are the critical data endpoints that need
to be tested?
• How do I verify that the data from my various
source systems is propagating through the
architecture?
• How do I validate data in the cloud
environments?
• Is bad data making it into the architecture?
• How much of the data testing can be automated?
COST
Data Mapping Development
Unit Testing
QA Test Cycle
UAT
Testing
End
User
Solutions
Finding Bad Data
• Identify testing points
• Review data mappings
• Data Testing Strategies
• comparisons (source vs. target)
• row counts
• minus queries
• automation tools
Solutions
Data Testing Permutations
• Analyze the data mappings
• Develop a test Data Set
o Review Transformation Logic
▪ Case Statements
▪ Field Merges/ Field Splitting
▪ Translations (Lookups)
▪ Derived
• Replication of production data
• Homegrown or Freeware
• Enterprise solutions
o IBM InfoSphere Optim, GenRocket, SAP, Computer Associates
Test Data Generation
Solutions
How much data to validate?
• Requirements
• Regulatory authorities may require 100% of your data be tested.
• In other cases, 90% or 80% may be the goal.
• Time, resource and scope driven
• Release timeline
• Available resources
• Scope of authoring and executing tests
• Risk Assessment
• Business Acceptance Criteria – End users define their primary data use cases.
• Critical Path – Validate the data the flows through the high priority data
endpoints within in your system.
𝑇𝑒𝑠𝑡 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙
# 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒)
= # 𝑜𝑓 𝑑𝑎𝑦𝑠
𝑇𝑒𝑠𝑡 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙
# 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒)
= # 𝑜𝑓 𝑑𝑎𝑦𝑠
Solutions
Automation vs Manual
• Recurrence
• Avoid complicated single use test cases
• Focus on repeatable testing paths
• Ensure modularization of test data sets
• Test Data Sets
• Consider automation tool’s assigned hardware resources and performance
which must be able to handle the load of the data set under test
• Include time needed to prepare environments into your testing estimates
• Database Performance
• Set expectations on database hardware & responsiveness.
• SQL query response time will factor into overall test run times
Solutions
How do I test data in my cloud environment ?
• On-Prem vs Cloud
o Follow the same testing methodologies but with considerations for cloud
connections and scalability
o If an automated solution is being pursued, confirm the tools involved
allows for connectivity to your cloud environment
• Hybrid-Could Mapping
o Interface documentation
o Define entry & exit points if applicable
• Digital Transformation
o Clearly defined conversion
requirements and mappings
• Environment Scalability
• Define limitations on testing environment resources
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
Data Validation Assessment
What are the goals of a
Data Validation assessment?
• Receive an expert evaluation of your
current data validation process
• Provide recommendations on how to
improve your process
• Proposal for successful implementation
of your goals
Data Validation Assessment
Components of the Assessment
• Business analysis
• Data architecture analysis
• ETL testing process evaluation
• DataOps & DevOps evaluation
• Resource evaluation (optional)
• Metrics evaluation
• Risk assessment
Data Validation Assessment
Interview with Key Players
• Business/Data Analysts create requirements
• QA Testers develop and execute test plans and
test cases
• Architects set up environments
• Developers create ETL code, perform unit tests
• DBAs test for performance and stress
• Business Users perform functional User
Acceptance Tests
Data Validation Assessment
Process Review
• Review Requirements & Mapping documentation
• Testing Process Design
• Analysis of tools and DevOps/DataOps
• Reporting metrics evaluations
Data Validation Assessment
Deliverables
• Detailed analysis report with recommendations
for improvement
• Presentation to your team on our findings
• Proposal for successful implementation of your
goals
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
ETL Developer: Codes data movement based on Mapping Requirements
Data Warehouse
ETL
Data Tester: Tests data movement based on Mapping Requirements
Data Mart
ETL
Source Data Big Data lake
Testing Point #1 Testing Point #2 Testing Points #3
BI & Analytics
Testing Point #4
Tester tests BI
Reports
BI Analyst extracts
data for reports
Data Testing - Developer & Tester
Source-to-Target Map
It’s the critical element required to
efficiently plan the target Data
Stores. It also defines the Extract,
Transform, Load (ETL) process.
Intention:
✓ capture business rules
✓ data flow mapping and
✓ data movement requirements.
Mapping Doc specifies:
▪ Source input definition
▪ Target/output details
▪ Business & data transformation rules
▪ Absolute data quality requirements
▪ Optional data quality requirements.
Data Requirements = Mapping Document
Data Testing Strategies
Testing Methods
Minus Queries – Create a SQL source query and a SQL Target query. Utilizing SQL, subtract
source query results from target query results and subtract target query results from
source query results
Visual Compare – View source data and target
data and manually compare
Record Counts – Creating a SQL source and
target query to return a record counts and
comparing the values
Automation – Utilizing an automation tool to compare SQL source and target query results
Sampling
Level
1
Sampling a % of data by visually comparing data sets. Not repeatable.
Excel, Ad Hoc Reporting
Level
2
Using Excel or other homegrown method. Ad hoc reporting.
Minus Queries
Level
3
Utilizing SQL editor & minus queries to test data. More
detailed reporting.
Data Test Automation
Level
4
Repeatable test automation, agreed-upon process, centralized
reporting.
On which Level
should your
process be?
Data Quality Optimizing
Level
5
Full automation, tracking of ROI, predictive data issues, auditable
results. Business value is fully understood/supported by management.
Data Maturity Model - Test Execution
Data
Validation
Data Testing
Strategies
Intro
a software division of
Assessment
Case Study
Data Validation Assessment by
A company in the financial industry had a development and QA team assigned to
their ETL process. But there were still issues:
Case Study
• They were still suffering from incorrect data
fields populating their Business Intelligence
(BI) reports
• Development cycles were frequently delayed
• Management was losing confidence in the BI
reporting data
CASE STUDY
OVERVIEW
Senior RTTS resources were brought in to assess the process
• Interview key players
• Review process documentation and tools
• Minimal requirements
• Ticketing system was not being implemented for
traceability
• Testing process of low-level maturity
o Table row counts
o Sampling
o Excel comparisons
Problem areas identified:
Case Study
Resource needs:
Case Study
Recommendations for Improvement
• Centralized mapping documentation
o Linking requirements to work items
tickets to test cases.
• Improve communications between team
members we recommended a new Data
Analyst role
• Narrowed focus of the stand-up meetings
• Implemented automated solutions to
expand coverage for larger data sets
DEMO:
Automating your data validation & testing
Any questions?
Creating a Data Validation & Testing Strategy

Contenu connexe

Tendances

Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsWayne Yaddow
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing processRakesh Hansalia
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform LoadABDUL KHALIQ
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Edureka!
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview trainingMondy Holten
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringDurga Gadiraju
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Rajesh Kumar
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionRTTS
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentationDavid Rice
 

Tendances (20)

Etl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large ApplicationsEtl And Data Test Guidelines For Large Applications
Etl And Data Test Guidelines For Large Applications
 
Data Warehouse (ETL) testing process
Data Warehouse (ETL) testing processData Warehouse (ETL) testing process
Data Warehouse (ETL) testing process
 
Etl - Extract Transform Load
Etl - Extract Transform LoadEtl - Extract Transform Load
Etl - Extract Transform Load
 
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
Talend ETL Tutorial | Talend Tutorial For Beginners | Talend Online Training ...
 
Etl testing
Etl testingEtl testing
Etl testing
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 
ETL QA
ETL QAETL QA
ETL QA
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Informatica Cloud Overview
Informatica Cloud OverviewInformatica Cloud Overview
Informatica Cloud Overview
 
ETL
ETLETL
ETL
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Modern data warehouse presentation
Modern data warehouse presentationModern data warehouse presentation
Modern data warehouse presentation
 

Similaire à Validate Data & Improve Testing

Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentRTTS
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department FinalWayne Yaddow
 
ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataBugRaptors
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessRTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryRTTS
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayTorana, Inc.
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration TestingNick van Beest
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bijeffd00
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha Hiremath
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseBugRaptors
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareJosiah Renaudin
 

Similaire à Validate Data & Improve Testing (20)

Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
Data Verification In QA Department Final
Data Verification In QA Department FinalData Verification In QA Department Final
Data Verification In QA Department Final
 
ETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your DataETL Testing Services - Safeguard Your Data
ETL Testing Services - Safeguard Your Data
 
Completing the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = SuccessCompleting the Data Equation: Test Data + Data Validation = Success
Completing the Data Equation: Test Data + Data Validation = Success
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Resume sailaja
Resume sailajaResume sailaja
Resume sailaja
 
Data Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical IndustryData Warehouse Testing in the Pharmaceutical Industry
Data Warehouse Testing in the Pharmaceutical Industry
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
DMM9 - Data Migration Testing
DMM9 - Data Migration TestingDMM9 - Data Migration Testing
DMM9 - Data Migration Testing
 
Pradeep_resume_ETL Testing
Pradeep_resume_ETL TestingPradeep_resume_ETL Testing
Pradeep_resume_ETL Testing
 
Data quality and bi
Data quality and biData quality and bi
Data quality and bi
 
Varsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8YearsVarsha_CV_ETLTester5.8Years
Varsha_CV_ETLTester5.8Years
 
What is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data WharehouseWhat is ETL testing & how to enforce it in Data Wharehouse
What is ETL testing & how to enforce it in Data Wharehouse
 
Testing in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf SoftwareTesting in the New World of Off-the-Shelf Software
Testing in the New World of Off-the-Shelf Software
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 

Plus de RTTS

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinarRTTS
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023RTTS
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdfRTTS
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure CloudRTTS
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectRTTS
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinarRTTS
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World DistilledRTTS
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOpsRTTS
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and DatabasesRTTS
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverRTTS
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumRTTS
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your DataRTTS
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityRTTS
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurgeRTTS
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...RTTS
 

Plus de RTTS (18)

Automated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
 
QuerySurge AI webinar
QuerySurge AI webinarQuerySurge AI webinar
QuerySurge AI webinar
 
State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023State of the Market - Data Quality in 2023
State of the Market - Data Quality in 2023
 
RTTS Postman and API Testing Webinar Slides.pdf
RTTS Postman and API Testing Webinar  Slides.pdfRTTS Postman and API Testing Webinar  Slides.pdf
RTTS Postman and API Testing Webinar Slides.pdf
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 Webinar - QuerySurge and Azure DevOps in the Azure Cloud Webinar - QuerySurge and Azure DevOps in the Azure Cloud
Webinar - QuerySurge and Azure DevOps in the Azure Cloud
 
Implementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing ProjectImplementing Azure DevOps with your Testing Project
Implementing Azure DevOps with your Testing Project
 
An introduction to QuerySurge webinar
An introduction to QuerySurge webinarAn introduction to QuerySurge webinar
An introduction to QuerySurge webinar
 
the Data World Distilled
the Data World Distilledthe Data World Distilled
the Data World Distilled
 
QuerySurge for DevOps
QuerySurge for DevOpsQuerySurge for DevOps
QuerySurge for DevOps
 
Whitepaper: Volume Testing Thick Clients and Databases
Whitepaper:  Volume Testing Thick Clients and DatabasesWhitepaper:  Volume Testing Thick Clients and Databases
Whitepaper: Volume Testing Thick Clients and Databases
 
Case study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriverCase study: Open Source Automation Framework using Selenium WebDriver
Case study: Open Source Automation Framework using Selenium WebDriver
 
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality ConundrumEnterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
Enterprise Business Intelligence & Data Warehousing: The Data Quality Conundrum
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Big Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data QualityBig Data Testing: Ensuring MongoDB Data Quality
Big Data Testing: Ensuring MongoDB Data Quality
 
RTTS - the Software Quality Experts
RTTS - the Software Quality ExpertsRTTS - the Software Quality Experts
RTTS - the Software Quality Experts
 
Testing Big Data: Automated Testing of Hadoop with QuerySurge
Testing Big Data: Automated  Testing of Hadoop with QuerySurgeTesting Big Data: Automated  Testing of Hadoop with QuerySurge
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Validate Data & Improve Testing

  • 1. Webinar Mike Calabrese Team Lead/Senior Engineer Bill Hayduk Founder/CEO Creating a Data Validation & Testing Strategy
  • 2. Copyright Real-Time Technology Solutions, Inc. 2019 CONFIDENTIAL – DO NOT distribute
  • 3. Facts Founded: 1996 (24th anniversary) Location: New York City (HQ) Customer profile: • Fortune 500 & mid-size • 700+ customers Strategic Partners: IBM, Microsoft, Oracle, Teradata, Cloudera, HortonWorks, MongoDB, SAP, Micro Focus Other Software Supported QuerySurge, Selenium, Appium, CitraTest, Postman, Smart Bear, JMeter, others RTTS is the premier pure-play QA & Testing firm that specializes in Test Automation
  • 6. Handles more than 1 million customer transactions every hour. • data imported into databases that contain > 2.5 petabytes of data • the equivalent of 167 times the information contained in all the books in the US Library of Congress. Facebook handles 40 billion photos from its user base. Google processes 1 Terabyte per hour Twitter processes 85 million tweets per day eBay processes 80 Terabytes per day others Big Impacts of Big Data
  • 7. Data Warehouse Marketplace “the worldwide data warehouse management software market is forecast to generate nearly $17 billion in revenue by 2020” - Forrester Top vendors: Oracle, Teradata, IBM, Microsoft, SAP, Micro Focus and Amazon Business Intelligence Marketplace “The business intelligence (BI) and analytics software market is forecast to grow to $22.8 billion by the end of 2020” - Gartner SAP, IBM, SAS, Microsoft, Oracle, Tableau, Qlik, MicroStrategy , Information Builders DWH, BI, Big Data Marketplaces Big Data Marketplace “By the end of 2020, companies will spend > USD $72 billion on on Big Data hardware, software, & professional services” - IDC Oracle, IBM, Microsoft, Amazon, Micro Focus, HortonWorks, Cloudera, Teradata, SAP, MongoDB, MapR, DataStax, Snowflake.
  • 8. Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DWH ETL Process Business Intelligence (BI) & Analytics Data Mart
  • 9. Impacts of Bad Data “On average, poor data quality costs organizations $14.2 million annually.” a software division ofQuerySurge™ “Dirty data costs the average business 15% to 25% of revenue.” “Cleaning up data will lead to average cost savings of 33%, while boosting revenue by an average of 31%.”
  • 10. Data Validation Data Testing Strategies Intro a software division of Assessment Case Study Data Validation Assessment by
  • 11. What is Data Validation? Data Validation Testing The process of verifying your data is completely and accurately moved through your systems according to the business requirements. Legacy DB CRM/ERP DB Finance DB Source Data ETL Process Target DWH Extract Transform Load
  • 12. • Data Completeness Verifying that all data has been loaded from the sources to the target Data Warehouse. Validate the correct data displays in BI reports. Data Validation Testing • Data Transformation Ensuring that all data has been transformed correctly during the extract-transform-load (ETL) process. • BI Report Testing Verify that BI Reports are formatted correctly, calculated fields are validated, and data is verified against the underlying data. DATA VALIDATION TEST TYPES • BI Performance Testing Ensure your BI Reports can be generated in a reasonable amount of time • Data Quality Ensuring that the ETL process correctly rejects, substitutes default values, corrects or ignores and reports invalid data.
  • 13. Finding Bad Data Issue Description Possible Causes Missing Data Data that does not make it into the target database • Invalid or incorrect lookup table in the transformation logic • Bad data from the source database (Needs cleansing) • Invalid joins Truncation of Data Data being lost by truncation of the data field • Invalid field lengths on target database • Transformation logic not considering field lengths from source Data Type Mismatch Data types not set up correctly on target database Source data field not configured correctly Null Translation Null source values not being transformed to correct target values Development team did not include the null translation in the transformation logic Wrong Translation Opposite of the Null Translation error. Field should be null but is populated with a non-null value or field should be populated, but with the wrong value Development team incorrectly translated the source field for certain values Misplaced Data Source data fields not being transformed to the correct target data field Development team inadvertently mapped the source data field to the wrong target data field Extra Records Records which should not be in the ETL are included in the ETL Development team did not include filter in their code Not Enough Records Records which should be in the ETL are included in the ETL Development team had a filter in their code which should not have been there
  • 14. Finding Bad Data (cont.) Issue Description Possible Causes Transformation Logic Errors/Holes Testing sometimes can lead to finding “holes” in the transformation logic or realizing the logic is unclear Development team did not take into account special cases. For example international cities that contain special language specific characters might need to be dealt with in the ETL code Simple/Small Errors Capitalization, spacing and other small errors Development team did not add an additional space after a comma for populating the target field. Sequence Generator Ensuring that the sequence number of reports are in the correct order is very important when processing follow-up reports or answering to an audit Development team did not configure the sequence generator correctly resulting in records with a duplicate sequence number Undocumented Requirements Find requirements that are “understood” but are not actually documented anywhere Several of the members of the development team did not understand the “understood” undocumented requirements. Duplicate Records Duplicate records are two or more records that contain the same data Development team did not add the appropriate code to filter out duplicate records Numeric Field Precision Numbers that are not formatted to the correct decimal point or not rounded per specifications Development team rounded the numbers to the wrong decimal point Rejected Rows Data rows that get rejected due to data issues Development team did not take into account data conditions that could break the ETL for a particular row
  • 15. Challenges • How much data needs to be validated/tested? • How do I ensure I am testing the proper data permutations? • What are the critical data endpoints that need to be tested? • How do I verify that the data from my various source systems is propagating through the architecture? • How do I validate data in the cloud environments? • Is bad data making it into the architecture? • How much of the data testing can be automated?
  • 16. COST Data Mapping Development Unit Testing QA Test Cycle UAT Testing End User Solutions Finding Bad Data • Identify testing points • Review data mappings • Data Testing Strategies • comparisons (source vs. target) • row counts • minus queries • automation tools
  • 17. Solutions Data Testing Permutations • Analyze the data mappings • Develop a test Data Set o Review Transformation Logic ▪ Case Statements ▪ Field Merges/ Field Splitting ▪ Translations (Lookups) ▪ Derived • Replication of production data • Homegrown or Freeware • Enterprise solutions o IBM InfoSphere Optim, GenRocket, SAP, Computer Associates Test Data Generation
  • 18. Solutions How much data to validate? • Requirements • Regulatory authorities may require 100% of your data be tested. • In other cases, 90% or 80% may be the goal. • Time, resource and scope driven • Release timeline • Available resources • Scope of authoring and executing tests • Risk Assessment • Business Acceptance Criteria – End users define their primary data use cases. • Critical Path – Validate the data the flows through the high priority data endpoints within in your system. 𝑇𝑒𝑠𝑡 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑎𝑢𝑡ℎ𝑜𝑟𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) = # 𝑜𝑓 𝑑𝑎𝑦𝑠 𝑇𝑒𝑠𝑡 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒𝑠 ∗ (# 𝑜𝑓 ℎ𝑜𝑢𝑟𝑠 𝑝𝑒𝑟 𝑑𝑎𝑦 𝑒𝑥𝑒𝑐𝑢𝑡𝑖𝑛𝑔 𝑝𝑒𝑟 𝑟𝑒𝑠𝑜𝑢𝑟𝑐𝑒) = # 𝑜𝑓 𝑑𝑎𝑦𝑠
  • 19. Solutions Automation vs Manual • Recurrence • Avoid complicated single use test cases • Focus on repeatable testing paths • Ensure modularization of test data sets • Test Data Sets • Consider automation tool’s assigned hardware resources and performance which must be able to handle the load of the data set under test • Include time needed to prepare environments into your testing estimates • Database Performance • Set expectations on database hardware & responsiveness. • SQL query response time will factor into overall test run times
  • 20. Solutions How do I test data in my cloud environment ? • On-Prem vs Cloud o Follow the same testing methodologies but with considerations for cloud connections and scalability o If an automated solution is being pursued, confirm the tools involved allows for connectivity to your cloud environment • Hybrid-Could Mapping o Interface documentation o Define entry & exit points if applicable • Digital Transformation o Clearly defined conversion requirements and mappings • Environment Scalability • Define limitations on testing environment resources
  • 21. Data Validation Data Testing Strategies Intro a software division of Assessment Case Study Data Validation Assessment by
  • 22. Data Validation Assessment What are the goals of a Data Validation assessment? • Receive an expert evaluation of your current data validation process • Provide recommendations on how to improve your process • Proposal for successful implementation of your goals
  • 23. Data Validation Assessment Components of the Assessment • Business analysis • Data architecture analysis • ETL testing process evaluation • DataOps & DevOps evaluation • Resource evaluation (optional) • Metrics evaluation • Risk assessment
  • 24. Data Validation Assessment Interview with Key Players • Business/Data Analysts create requirements • QA Testers develop and execute test plans and test cases • Architects set up environments • Developers create ETL code, perform unit tests • DBAs test for performance and stress • Business Users perform functional User Acceptance Tests
  • 25. Data Validation Assessment Process Review • Review Requirements & Mapping documentation • Testing Process Design • Analysis of tools and DevOps/DataOps • Reporting metrics evaluations
  • 26. Data Validation Assessment Deliverables • Detailed analysis report with recommendations for improvement • Presentation to your team on our findings • Proposal for successful implementation of your goals
  • 27. Data Validation Data Testing Strategies Intro a software division of Assessment Case Study Data Validation Assessment by
  • 28. ETL Developer: Codes data movement based on Mapping Requirements Data Warehouse ETL Data Tester: Tests data movement based on Mapping Requirements Data Mart ETL Source Data Big Data lake Testing Point #1 Testing Point #2 Testing Points #3 BI & Analytics Testing Point #4 Tester tests BI Reports BI Analyst extracts data for reports Data Testing - Developer & Tester
  • 29. Source-to-Target Map It’s the critical element required to efficiently plan the target Data Stores. It also defines the Extract, Transform, Load (ETL) process. Intention: ✓ capture business rules ✓ data flow mapping and ✓ data movement requirements. Mapping Doc specifies: ▪ Source input definition ▪ Target/output details ▪ Business & data transformation rules ▪ Absolute data quality requirements ▪ Optional data quality requirements. Data Requirements = Mapping Document
  • 30. Data Testing Strategies Testing Methods Minus Queries – Create a SQL source query and a SQL Target query. Utilizing SQL, subtract source query results from target query results and subtract target query results from source query results Visual Compare – View source data and target data and manually compare Record Counts – Creating a SQL source and target query to return a record counts and comparing the values Automation – Utilizing an automation tool to compare SQL source and target query results
  • 31. Sampling Level 1 Sampling a % of data by visually comparing data sets. Not repeatable. Excel, Ad Hoc Reporting Level 2 Using Excel or other homegrown method. Ad hoc reporting. Minus Queries Level 3 Utilizing SQL editor & minus queries to test data. More detailed reporting. Data Test Automation Level 4 Repeatable test automation, agreed-upon process, centralized reporting. On which Level should your process be? Data Quality Optimizing Level 5 Full automation, tracking of ROI, predictive data issues, auditable results. Business value is fully understood/supported by management. Data Maturity Model - Test Execution
  • 32. Data Validation Data Testing Strategies Intro a software division of Assessment Case Study Data Validation Assessment by
  • 33. A company in the financial industry had a development and QA team assigned to their ETL process. But there were still issues: Case Study • They were still suffering from incorrect data fields populating their Business Intelligence (BI) reports • Development cycles were frequently delayed • Management was losing confidence in the BI reporting data CASE STUDY OVERVIEW
  • 34. Senior RTTS resources were brought in to assess the process • Interview key players • Review process documentation and tools • Minimal requirements • Ticketing system was not being implemented for traceability • Testing process of low-level maturity o Table row counts o Sampling o Excel comparisons Problem areas identified: Case Study Resource needs:
  • 35. Case Study Recommendations for Improvement • Centralized mapping documentation o Linking requirements to work items tickets to test cases. • Improve communications between team members we recommended a new Data Analyst role • Narrowed focus of the stand-up meetings • Implemented automated solutions to expand coverage for larger data sets
  • 36. DEMO: Automating your data validation & testing
  • 37. Any questions? Creating a Data Validation & Testing Strategy