2. DEFINITION OF DATA MANAGEMENT
Data Management:
The business function of planning for,
controlling and delivering data and
information assets. This function
includes:
The disciplines of development,
execution, and supervision of plans,
policies, programs, projects,
processes, practices, that control,
protect deliver, and enhance the value
of data information assets.
--- DMBOK
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 2
3. THE STORY OF TWO LIFECYCLES
SYSTEM DEVELOPMENT LIFECYCLE (SDLC)
Plan Analyze Design Build Test Deploy Maintain
DATA LIFECYCLE
Create & Maintain Archive &
Plan Specify Enable Purge
Acquire & Use Retrieve
Data is created or acquired, stored and maintained, used, and eventually purged.
As I‘m sure many businesses, SMB and Enterprise alike, agree, here’s where it gets
interesting. This is due to the dynamics of data, as it may be extracted, imported, exported,
validated, cleansed, transformed, aggregated, analyzed, reported, updated, archived, and
backed up, to name a few, prior to purging.
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 3
4. HOW DO WE TRANSFORM THE TRADITIONAL LIFE CYCLE TO
HANDLE TODAY’S DATA INTEGRATION DEMANDS?
WAT E R F A L L
METHODOLOGY AGILE METHODOLOGY
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 4
5. COMPONENTS OF AGILE
Story Writing
Estimation
APPLY TO DATA INTEGRATION
Release Planning LIFE CYCLE
KEYS ARE:
Sprint Planning 1. CADENCE
2. CALLABORATION
Metrics 3. COMMUNICATION
4. RISK MITIGATION
5. MINIMIZE DATA TIME TO
USE FOR THE BUSINESS
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 5
6. HOW DOES AGILE APPLY TO DATA
INTEGRATION?
For the purpose of this presentation, I will be providing examples in
relation to an enterprise data warehouse (EDW). In this case, the
data sets are large, unstructured data which is referring to data that
does not fit well into relational database management systems
(RDMS).
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 6
7. EXAMPLE: ADDING COMPLEX DATA FROM A NEW SOURCE
INTO THE ENTERPRISE DATA WAREHOUSE (EDW)
Below are process steps within an Iteration that integrates with the Agile Components and
the macro Data Integration Life Cycle
DATA GOVERNANCE (Meta Data and
Document Control)
Coding & Data
Requirements Data QA & System
Transformation Development /
Testing / Deployment
Profiling Rules & Coding
Validation
Mappings
Rework Rework
Rework
Rework
COMMUNICATION &
RISK MITIGATION
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 7
8. HOW DO WE USE THE AGILE COMPONENTS WITH THE DATA INTEGRATION LIFE CYCLE?
• Story Writing
C Requirements
• Estimation
Story
Writing O
• Estimation
M • Release Planning
Data Profiling • Spring Planning
M
Estimation
U • Estimation
Coding & Data • Release Planning
N Transformatio
n Rules &
• Sprint Planning
Mappings
Release I
Planning
C • Release Planning
• Sprint Planning
Development /
A Coding
Sprint
Planning T
• Estimation
I QA & System
•
•
Release Planning
Sprint Planning
Testing / • Metrics
O Validation
Metrics
N
• Retrospective / Lessons Learned
Deployment • Continuous Improvement
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 8
9. STORY WRITING
How does a team determine requirements?
Understand the business case / problem statement
Draw on team’s expertise to determine tables affected for new data source
Data Profiling can assist in determining database tables affected
Define all areas of the business affected – Define as Epic vs. Function vs. Task
Breakdown
Tools that can be used:
User Stories, Refer to Stakeholder Matrix, Card, User Conversations,
Confirmation (Consensus), Acceptance Criteria, System As A Whole Mentality
w/in Scope, What/Why/How Personas, Questionnaires, Observations, SMEs,
SPIOC Diagrams, Ishikaw Diagrams, RACI Matrix, to name a few
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 9
10. EPIC STORY WRITING EXAMPLE (SIPOC) =>STORIES FOR LARGE DATA SETS
Define the Process
Who What is What STEPS are Included WHAT does the WHO are your
PROVIDES provided to in the Process today? customer primary
the input? START the (high level) receive? (Think of customers?
process? their CTQ’s)
S p lie
up r In u
pt P cs
ro es O tp t
u u C s mr
u to e
(Who) (Nouns) (Verbs) (Nouns) (Who)
Software / Hardware Requirements Cycle Time for Data to Third Party Extract
Regulations Use Recipients
Vendors
Source Input Customer/ Data Profiling Report Generation / Stakeholders (Internal /
Data Transportation &
Organization External Extracts External)
Security
Staff Training & Valid / Invalid Data to
Government Coding & Data Transformation Rules Regulators
Availability (Resources) the Warehouse
and Mappings
Internal Functions IDS, EDW, Data Mart / Metric Evaluation
Development / Coding Vendors
affected by data / SMEs Tables Effected
Data Analytics
Database Environment (Transactional / Mobile Device / Web
/ Platform(s) QA & System Testing / Validation Customers
Analytical)
Methodology & Risk Analysis
Standards Deployment
Process Project /
Program Management Testing Results and
Plans Evaluations
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 10
11. ESTIMATION
Understand the assumptions and constraints
Make sure requirements are understood
Understand potential and known areas of rework
Use historical throughputs of similar projects
Estimations are not contracts – so have cultural flexibility with the team
Break down requirement(s) stories into tasks
Monitor backlogs throughout iteration => helps for sprint determination
Tools That Can Be Used:
Poker Planning, Historical Estimates, Velocities for Sprints, Forecasting as
a Range/Percentage (Short Term) for sprints and project durations,
Project Cost Estimations from Velocity Forecasting, Process Mapping,
Hypothesis Statements
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 11
12. ESTIMATION EXAMPLE
Three Components:
■ Estimate Size of Stories = Defines Sprint
■ Measure Velocity For Each Iteration = Total Sprints Throughput
Iteration 1 Forecast:
■ Forecast Duration Predict using a Range
5
4 and a % using Project
3 backlog
ESTIMATION 2 - Derive Low Velocity
(STORY PTS.) 1 - Derive High Velocity
0 - Derive Average
TASK
Sprint 1
Sprint 2
Sprint 3
Sprint 4
Define fields to be Velocity
mapped (100)
- Forecast project
TASK Profile source to duration by # of
target data for sprints then convert to
mapping / coding
complexity SPRINT $/sprint then
$/iteration
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 12
13. RELEASE PLANNING
Paradigm shift between traditional plan driven to agility driven from vision
and values.
Agile Levels: DI Vision, DI Roadmap, Go Live Plan, Iteration Plan, Daily
Commitment
Set iterations to fit DI Roadmap (usually 1 – 4 week timeframe); decrease
data to business use cycle times
Connects strategic vision to delivery approach (source to
target), Eliminates Waste (rework) / Lean, Eliminates Variation, Better
Decision Making, Improves Communication, Improves Morale
Release Planning/DI Planning leads to Roadmap, Plan, Backlog
Key Elements: Schedule, Estimates on Epics / Stories, Prioritized
Backlogs, Velocity of Team
Bottom Line to Tools: Complexity is Estimated, Velocity is
Measured, Duration is Derived
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 13
14. RELEASE PLANNING PICTORIAL
RELEASE / DATA INTEGRATION PHASE 1
Iteration 1 Iteration 2 Iteration 3
RELEASE / DATA INTEGRATION PHASE 2
Iteration Iteration Iteration Iteration Iteration
4 5 6 7 8
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 14
15. SPRINT PLANNING
● Determine and agree on the sprint and next sprint goals
● Determine required attendees, inputs and outputs
● Prioritized logs/backlogs and validate based on estimates
● Review and seek clarification of stories & tasks
● Define and estimate the work plan by breaking into tasks from user
stories
● Daily Standups
● Sprint Review and Demo Integration
● Retrospective / Lessons Learned
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 15
16. EXPANDING ON SPRINT PLANNING ELEMENTS
● Participation
● Prioritized Backlog
● Presentation of Candidates Stories
● Agreeing On Sprint Goal
● Validation of Sprint Backlog Based on Team
Estimation of Stories
● Capacity Planning
● Defining and Estimating the Work Plan
● Daily Stand Up Meetings
● Sprint Review and Closeout
● Retrospective / Lessons Learned
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 16
17. METRICS
● Derive measurements (Quantitative/Qualitative)
● Leading / Lagging measurements
● Metrics must be motivational and informative
● Determine whether tasks are done – either 100% complete or not complete
● Some agile metrics (going beyond common metrics):
■ Velocity – Sum of points delivered for each iteration / # of iterations
■ Burndown – Rate at which requirements are being delivered
■ Burnup – Project story points are being met – (i.e. scope)
■ Cumulative Flowcharts – The requirements are in respect to the lifecycle
over time (i.e. Not Started, In Progress, Pending Acceptance, Completed)
Leads to more accurate OLAP and/or OLTP for BI and Analytic results in
conjunction with the company’s business model and dynamic efforts
regarding data management strategic planning efforts.
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 17
18. EXAMPLES OF AGILE METRICS - BURNDOWN
90
80
70
60
% COMPLETE 50
Ideal
40
30 Actual
20
10
0
1 2 3 4 5
Iterations
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 18
19. QATesting Defects Pareto Chart
120%
100%
80%
Frequency
%
60%
Cumulativ e %
40%
20%
0%
Mapping Coding Target Meta Data Data Joins Data Type Foreigh Grouping Wrong SK
Domains Standards Key Value
Unclear Cause Lookup
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 19
20. EXAMPLES OF AGILE METRICS - BURNUP
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 20
21. EXAMPLES OF AGILE METRICS - ITERATION
If backlog is sized at 60 story
Points, using this velocity trend COST USING VELOCITY
The projected duration is:
Range:
Iteration - Duration Estimate
Low Velocity: 10 story points
High Velocity : 30 story points
Average Velocity: 20.5 story points 30
25
The team’s velocity ranged from
10 to 30 story points. 20
15 Estimate
60/10 = 6 sprints
10
60/30 = 2 sprints
5
Backlog will release between 2 and
0
6 sprints
Sprint 1 Sprint 2 Sprint 3 Sprint 4
Notice Sprints 1 and 2 have a high degree of story point variability, as If cost per sprint is $10,000 then iteration range prediction is:
the team is likely in the Forming/Storming team development stages.
Sprints 3 & 4 tend to be closer in story points, as the team begins to Low Estimate: (2 sprints)(10,000) = $20,000
High Estimate: (6 sprints)(10,000) = $60,000
attain the Norming/Performing team development status.
Avg. Estimate (2.9 sprints)(10,000) = $29,000
DATA MANAGEMENT & INTEGRATION: BUSINESS
DILEMMAS 21