For over 10 years, we have been doing agile for software development yet people struggle to do agile for data, BI, and analytics. After a quick review of the agile manifesto and principles, this talk looks at which agile practices have worked for data and which are still hard. Then, with analyst requirements in mind, this talk reveals the 5 shocking steps to actually do agile with data.
3. Gil Benghiat – decades working with data
• Network Management Data
• Database Management
• Clinical Trial Data
• Pharmaceutical Sales Data
• Data Liberation
• Data Preparation
gil@datakitchen.io
@benghiat
6/2/2015 3
Solid Oak Consulting
4. 4
Data Analysts And Their Teams Are Spending
60-80% Of Their Time
On Data Preparation And Production
5. This creates an expectation gap
5
Analyze
Prepare Data
C
Analyze
Prepare Data
Business Customer
Expectation
Analyst
Reality
Communicate
The business does not
think that Analysts are
preparing data
Analysts don’t want to
prepare data
6. 6
DataKitchen is on a mission
to integrate and organize
data to make analysts
super-powered.
• Offering
• Set-up service
• Software subscription
• UI to integrate data
• Benefits
• Data warehouse
• Eliminate drudgery of repeated integrations
8. agilemanifesto.org
6/2/2015 8
and excel files
s/software/analytics/
The switch
works for the 12
principles too.
Iterate to
improve the
analytics.
Iterate to
improve the
process.
9. Agile methodologies contain a number of practices
that can apply to data
Sprints
Stories
Prioritization
Daily Meeting
Defined roles
Retrospectives
Pair Programming
Burn down charts
etc.
9
The Data Analyst has the central role as the
bridge between business and data
10. What do analysts and data scientists want?
Flexibility
&
Speed
6/2/2015 10
You need to
be fast and
produce
trustworthy
data
11. Some practices have been difficult to apply to data
Test Driven Development
Branching and merging
Refactoring
Small Releases
Frequent or Continuous Integration
Experimentation for learning
11
13. ❶ Add tests
Types
1. Error – stop the line
2. Warning – investigate later
3. Info – list of changes
Examples
1. Input file row count way below
a critical threshold
2. Input file row count a little
below a threshold
3. These customers changed
territories
6/2/2015 13
And keep adding them with each feature developed!
14. ❷ Manage your transforms like code
Use a source code control system (like GIT) to enable:
• Branching
• Merging
• Diff
6/2/2015 14
15. ❸ Provide a data environment for each branch
The underlying data is needed to
develop and test the code/transformations
6/2/2015 15
16. ❹ Support three types of workflows
Small Team
Promote directly to production
Feature Branch
Merge back to production branch
Data Governance
3rd party verification before
production merge
6/2/2015 16
Review
Test
Approve
17. ❺ Give you analysts and data scientists the ability
to edit the DW safely
6/2/2015 17
Best-in-class companies take 12 days to
integrate new data sources into their
analytical systems; industry average
companies take 60 days; and, laggards
average 143 days
Source: Aberdeen Group: Data Management for BI: Fueling the
analytical engine with high-octane information
Figure out how to
do this in
minutes