James Barrett, Data Quality Service Manager in Enterprise Data, Operations & Technology at Fannie Mae, shares how Fannie Mae leverages data quality to improve the business at the 2015 Informatica Government Summit.
4. James Barrett is the Data Quality Manager in Enterprise Data, Operations &
Technology at Federal National Mortgage Association (Fannie Mae). His
background includes architecture (enterprise and solutions), database
administration, project management, and custom software development
specializing in enterprise data stores, and of course, data quality.
Note: The views expressed in this presentation are the speaker’s and do not
necessarily represent those of Fannie Mae.
SPEAKER
5. How Fannie Mae Leverages Data Quality to
Improve the Business
1. Overview
2. Data Quality …
Who cares?
Why care?
What is it?
When to deploy?
Where to deploy?
3. Expectations & Experiences
Centralized vs. federated vs. self‐service models for DQ build‐out
Effective self‐service DQ
DQ integration with enterprise architecture
Cost reduction
DQ ownership
4. Data Quality – Next Steps
6. Who cares about Data Quality?
• Regulators
– Enterprise vs business “silos”
• Data Governance & Chief Data Officer
– Responsible for DQ to Senior Management
• Data Owners
– Need to be aware of DQ and fix it if necessary
• Data Managers
– Governance and Owners look to EDM for solutions for DQ
• Users of Data
– Provide data used by decision‐makers – tactical and strategic
• People affected by decisions made by users of data
– Customers, policy‐makers, planners
The Enterprise Data Quality Manager has many viewpoints and opinions to consider!
9. When to deploy Data Quality?
• Re‐active
– After somebody notices
– After somebody asks (with/out $)
• Pro‐active
– Before anybody notices
– Before it spreads downstream
– Use a pre‐defined list of data attributes and
standard rules
• Exceptions: accept, replace, or reject
Being pro-active can be expensive; being re-active is risky
Consider your consumers when defining exception rules
10. Where to deploy Data Quality?
• Application Build‐Out ‐ Centralized vs. Federated
– While loading (“in‐flight) OR after loading data (“at rest”)
• Self‐Service
– Can be fast and cheap
– Can’t handle all DQ rules and requirements
• Areas of risk
– How to identify?
– How to quantify?
• At the source if possible
– Need 20/20 hindsight OR green‐field projects
Hybrid strategies seem the most robust
11. DQ application build‐out
• Centralized
– Rules built/run by 1 application for other applications’ data – ”at rest”
– Single application implies single owner ‐> enterprise data governance
– Initial DQ build‐out: DQ standards, design patterns, CoE resources
• Federated
– Rules built/run by application for its own data – “in flight” or “at rest”
– Rules and stored metrics/exceptions owned by each application
• Self‐Service
– Rules built/run by data analyst/team – “at rest”
– Rules and stored metrics/exceptions owned by team/analyst
– Tool‐based (e.g, IDQ Analyst) rather than custom development
Federated scales better than centralized model; Self-Service has lowest cost per
rule but setup and support requires DQ CoE
13. DQ <‐> Enterprise Architecture
• Centralized DQ rule repository
• Data quality rule lineage
• Technical vs. business DQ rules
• Patterns for DQ rules in data flow from/to:
– Transaction Data Store
– Operational Data Store
– Master Data Store
– Data Warehouse
– Data Mart
• BPM and BAM
– Data exceptions and corrections imply:
• Alerts
• Replay corrections for downstream
• Re‐calculation of derived attributes
13Architecting DQ <-> EA: don’t let the perfect be the enemy of the good.
15. DQ ownership
• Data Owners – defining DQ rules, by
ELDM entity/attribute
• Application Owners ‐ remediation of DQ
exceptions
• Data Governance ‐ DQ policies and
standards
• Data Management ‐ best practices for
implementation of DQ standards
• Data Users – identifying and raising DQ
issues to all of the above
DQ management requires good negotiation and persuasion skills to build teams
16. Data quality ‐ next steps
• Define KPIs to manage 3 DQ build‐out models
• Integrate Self‐Service and Federated DQ
• Quantify DQ risk at rule‐level, and apply to DQ warranty value‐chain
• Integrate BAM and BPM with data corrections
It ain’t what you don’t know that hurts you, it’s what you know for certain that ain’t so.
- Mark Twain