Tech companies place a premium on user experience. However, this laser-focus on users’ needs is too often missing from the design and development of internal analytical tools. This talk will explore what can be learned from open source development and the open science movement about building sustainable, accessible tools to fuel a vibrant “innersource” community.
Based on experience developing internal R packages at Capital One, this talk proposes the analyst-driven development paradigm for tools development. By reframing work from generating analyses to building reproducible analytical pipelines, analysts can efficiently deliver effective prototypes and finished tools as a simple byproduct of business-as-usual work.
More broadly, we will examine why empathy, empowerment, and engagement are the keys to successful open source and innersource projects, and how analyst-driven development deliberately yet seamlessly invokes these concepts into every step of the development process - from toolset curation to community building.
We will share best practices and lessons learned at Capital One - ranging from broad design philosophy to a specific R-based workflows - to motivate analysts to productionalize their analysis, develop better tools, and drive innovation within their own organizations.
Defining Constituents, Data Vizzes and Telling a Data Story
Designing Empathetic, Empowering, and Engaging Internal Tools for Analytics
1. Designing Empathetic, Empowering, and Engaging
Internal Tools
Emily Riederer
Sr. Analyst, Capital One
@EmilyRiederer / emily.riederer@capitalone.com
2. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Revolutions in science and technology have inspired step changes in how businesses
operate and catalyzed the need for building good internal tools
Scientific Observation Experimental Science Reproducible Research
Data Analysis at Scale Open-Source
Communities
3. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Revolutions in science and technology have inspired step changes in how businesses
operate and catalyzed the need for building good internal tools
Scientific Observation Experimental Science Reproducible Research
Data Analysis at Scale Open-Source
Communities
Hypothesis-Driven
Business Analysis
4. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Revolutions in science and technology have inspired step changes in how businesses
operate and catalyzed the need for building good internal tools
Scientific Observation Experimental Science Reproducible Research
Data Analysis at Scale Open-Source
Communities
Data-Driven
Business Analysis
5. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Reproducible Business Analysis with Innersourced Tools
Revolutions in science and technology have inspired step changes in how businesses
operate and catalyzed the need for building good internal tools
Scientific Observation Experimental Science Reproducible Research
Data Analysis at Scale Open-Source
Communities
6. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Businesses are taking novel approaches to filling this need, with analyst and developer roles
converging to the “analyst developer”
Analysis &
Insight Generation
• Analytical Frameworks
• Business Knowledge
• Scripts
• Data Sources
• Presentation Materials
Packaging as
Reproducible Tools
• Repositories
• R/python Packages
• Templates
• Demos/Tutorials
7. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Businesses are taking novel approaches to filling this need, with analyst and developer roles
converging to the “analyst developer”
Analysis &
Insight Generation
Packaging as
Reproducible Tools
8. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Much like open-source projects, empathy, empowerment, and engagement are key traits to
successful innersource development initiatives
Empathy
design to meet users’ needs
Empowerment
design to teach and facilitate
Engagement
design for extension with invitation to contribute
9. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Analyst-driven development creates natural empathy instead of relying on heuristics
Empathy
design to meet users’ needs
Empowerment
design to teach and facilitate
Engagement
design for extension with invitation to contribute
10. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
User stories for data products can overfit to one stakeholder’s needs
User Story
I want to
<do this>
I want to
<do this>
In order to
<achieve that>
As a
<customer>
11. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
User stories for data products can overfit to one stakeholder’s needs
VP/Director
Standardize
reporting metrics
Aggregate and
compare across
lines of business
User Story
I want to
<do this>
In order to
<achieve that>
As a
<customer>
12. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
User stories for data products can overfit to one stakeholder’s needs
VP/Director Work Manager
Standardize
reporting metrics
Ensure correct
calculations and
thorough review
Aggregate and
compare across
lines of business
Have confidence
in the rigor &
quality of my
team’s results
User Story
I want to
<do this>
In order to
<achieve that>
As a
<customer>
13. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
User stories for data products can overfit to one stakeholder’s needs
VP/Director Work Manager Data Analyst
Standardize
reporting metrics
Ensure correct
calculations and
thorough review
Rapidly complete
manual,
mechanical data
computations
Aggregate and
compare across
lines of business
Have confidence
in the rigor &
quality of my
team’s results
Invest time in
analysis and
insight generation
User Story
I want to
<do this>
In order to
<achieve that>
As a
<customer>
14. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
End-users themselves may not fully articulate needs as a workflow rather than discrete tasks
Data Analyst
• Find data
• Query data
• Clean data
• Calculate metrics
• Analyze results
• Debug & sanity check
• Seek help when
needed
• Iterate on analysis
• Share with manager
• Communicate findings
• Document process
• Be prepared for follow-ups
User Story
I want to
<do this>
In order to
<achieve that>
• Get the
information I need
• Uncover insights • Communicate findings
• Leave paper trail
As a
<customer>
15. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Decision
Making
Validation
&
Monitoring
Modeling
Scenario
Analysis
At Capital One, cashflow analysis is integral to many interrelated pieces of business analytics
Documentation
& Governance
16. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Decision
Making
Validation
&
Monitoring
Modeling
Scenario
Analysis
Documentation
& Governance
Database
System
BI Visualization
Tool
Legacy Statistical
Computing Platform
Legacy Statistical
Computing Platform
Legacy Statistical
Computing
Platform
FTP Client
FTP Client
Spreadsheet
Software
Spreadsheet
Software
Word Processor
Word Processor
Spreadsheet
Software
Presentation
Software
• Black box
• Limited capability
• Manual documentation
• Highly manual process
• System-specific
knowledge
• Slow iteration
Patchwork processes lead to inefficiency, poor documentation, and limited reproducibility
17. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Decision
Making
Validation
&
Monitoring
Modeling
Scenario
Analysis
Building the end-to-end tidycf R package enabled an efficient and reproducible workflow
• Accessible code
• Extensible code
• Real-time
documentation
• Automated &
reproducible
• General versus system-
specific knowledge
• Rapid iteration
18. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
By treating analysis as product, analyst-developers improve quality on the immediate ask
while justifying business value of investing in rigorous development
Notebook Function Discovery
Function
Modularization
Process Discovery
Template, Vignette
Clean-Up
Analysis &
Insight Generation
Packaging as
Reproducible Tools
19. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Much like open-source, internal tool development lends itself well to truly taking a user
perspective – without empathy interviews or A/B tests
Fake data is provided for illustrative purposes only and does not represent Capital One performance
20. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Organically evolving the tidycf package while addressing business problems led to
efficient and empathetic development
Task:
Valuations
Process 1:
Data Validation
Process 2:
Data Exploration
Process n-1:
Model Validation
Process n+1 … z:
Analysis with Model
Framework 1:
Data Validation
Framework 2:
Data Exploration
Framework n-1:
Model Validation
Framework n+1 … z:
Analysis with Model
calc
functions
viz
functions
tbl
functions
… …
Process 3:
Model Building
Framework 3:
Model Building
Process n:
Model Intuition
Framework n:
Model Intuition
Business Problems R Markdown Templates R functions
21. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Analyst developers know their own strengths and weakness and can build products that
empower users instead of patronizing them
Empathy
design to meet users’ needs
Empowerment
design to teach and facilitate
Engagement
design for extension with invitation to contribute
22. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Empathy alone cannot serve every need, so internal analytical tools must empower users to
extend analysis beyond cookie cutter frameworks and functionalities
Respect users
intelligence, but don’t
assume prescience
Avoid black-boxishness
(e.g. GUIs) and tool-
specific knowledge
Teach transferrable skills
by building off existing
frameworks
23. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Empowerment can take many different forms such as lending a helping hand, being
transparent, and being flexible
RStudio IDE’s data importer and database connector generated code for any GUI features for
user edification and future reproducibility
24. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
tidycf embeds RMarkdown templates to empower users through package discoverability, R
immersion, and enterprise knowledge transfer
Code comments explain syntax and
suggest new functions to try
Text commentary facilitates knowledge
transfer of business context and intuition
Fake data is provided for illustrative purposes only and does not represent Capital One performance
25. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Flexible internal tools integrate with broader ecosystems, like R’s tidyverse, to provide both
structure and flexibility
Fake data is provided for illustrative purposes only and does not represent Capital One performance
26. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Instead of prescribing approaches, opinionated internal tools can help establish norms and
best practices while allowing for boundless creativity and generalizability
Data Validation
Exploratory Data Analysis
Model Validation
Model Analytics
(Multiple Modeling Steps)
…
Model Monitoring
raw
out1
out_t
out_t+1
model
model
out1
out2
out_t+1
model
R Markdown Templates Output DataInput Data
./data/
./analysis/
./output/
Directoryexternal
External Source
In tidycf, RMarkdown templates read and save
artifacts to the appropriate relative paths so all
users end up with a standardized repository
27. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Analyst-driven development keeps tools relevant as empowered users help them to evolve
Empathy
design to meet users’ needs
Empowerment
design to teach and facilitate
Engagement
design for extension with invitation to contribute
28. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Empowered users with right incentives engage in a virtuous cycle – evolving tools informed
by business needs and constraints
Business Needs
Ad-Hoc AnalysisProductionalization
29. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Engagement is the lifeblood of many open source projects and analytical tools
“The purpose of this site is to help other R users easily
find ggplot2 extensions that are coming in ‘fast and furious’
from the R community….
When Hadley announced the release of ggplot2 2.0.0,
perhaps the most exciting news was the addition of an
official extension mechanism…
This means that even when less development occurs in the
ggplot2 package itself, the community will continue to
release packages for graphical analysis that extend/solve
different requirements.”
30. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Champion opportunities and celebrate success to engage user contribution and capture their
creations
Fake data is provided for illustrative purposes only and does not
represent Capital One performance
Opportunities
Appreciation
• Well-defined style guide,
CONTRIBUTING.md, and
process
• Issues with ideas, tags
• Vignettes/Examples
• Recognize & reward
• Bug reports, questions,
confusions, and
misunderstandings are
valuable feedback, too!
31. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Much like open-source projects, empathy, empowerment, and engagement are key traits to
successful innersource development initiatives
Empathy
design to meet users’ needs
Empowerment
design to teach and facilitate
Engagement
design for extension with invitation to contribute
32. Emily Riederer, Capital One (@EmilyRiederer) References on GitHub: emilyriederer/references/deee.md
Community building and incentives alignment is essential to effective analyst-driven
development
Open Source Open Science Innersource
• Pull system
• Motivated by:
• Contributing to
community
• Building reputation,
credibility, presence
• Push system
• Motivated by:
• Requirements for
publication
• Concerned by:
• Time investment
• Losing ownership
• Pull with recognition,
acknowledgement as
valuable investment
• Push with norms and
requirements
33. Designing Empathetic, Empowering, and Engaging
Internal Tools
Emily Riederer
Sr. Analyst, Capital One
@EmilyRiederer / emily.riederer@capitalone.com