Having data doesn't solve any business problem. Finding actionable insights and stories and implementing them to optimize business processes does.
This presentation was created by Sundeep Reddy Mallu for a virtual session with people at Indian School of Business (ISB) - Institute of Data Science.
The slides talk about how to create data stories and what parameters to keep in mind while creating one. With real-time case-studies and use cases of data storytelling, this presentation talks about how business leaders can identify Big, Useful, and surprising insights from big data sets.
3. My struggles with new tasks during the Pandemic
51
46
42
40
38
35
32
30 30 29
18
13
28
21
29 28
10
24
16
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19
Time(inminutes)
Week Count
This chart shows the average time it took me to complete sweep and mop the house floor
each day. The X-axis has time it takes daily to complete the task (averaged at week level).
The Y-axis has Week count. Week 1 starts on 25th March 2020.
I had hoped (wrongly so) to beat the expert time. I couldn’t translate my body agility on to
this task. I struggled with the nausea of what needs to be done, treacherous learning curve.
What’s unusual
More than 70% of time I took longer than an expert (20 minutes)
For the rest, I gave up or done part of the work
Expert time
(20 minutes)
4. Data storytelling is a critical skill for data scientists, analysts & managers
4
Stories are memorable. They spread virally
People remember stories. They’ll act on them.
People share stories. That enables collective action.
For people to act on analysis, data stories are critical.
But analysts present analysis, not stories
We present what we did. Not what you need.
You need to know what happened, why, & what to do.
Narrated in an engaging way. As a story.
We’ll learn how do that in this session.
Storytelling has a 30X Return on Investment
Rob Walker and Joshua Glenn auctioned common
items like mugs, golf balls, toys, etc. The item
descriptions were stories purpose-written by 200+
contributing writers.
Items that were bought for $250 sold for over $8,000 –
a return of over 3,000% for storytelling!
Mario – The Baker Mouse
Original price: $.50.
Final price: $62.00.
Story by – Megan O’Rourke
When I was a child, my mother used to
keep Mario on a shelf near the oven.
Sometimes I would play with him. She
told me that Mario was magic; in the
night, he made muffins light as manna
and delicate as silver. If you happened to
sleepwalk into the kitchen, you could eat
the muffins, but they disappeared by
morning.
…
5. Solving a Business Problem
Stage 1- Identify
Business Problem
Define the problem statement
by understanding:
• What is the business need
and desired outcome?
• Who will benefit?
• What is the impact?
• What is the success
criteria?
Stage 2- Translate to
Data Problem
• Breakdown the problem
statement into multiple use-
cases
• Connect each use case with
a data set
• Understand any limitations
on data sources- Internal
and External?
Stage 4- Translate
to Business Answer
• Stitch insights from
individual use case to
create a story
• Connect data story to help
in better decision making
• Measure success
Stage 3- Arrive at a
Data Answer
Target each use case with
data through:
• EDA and transformation
• Modelling
• Generating insights
Data Storytelling Data Storytelling
7. DO IT: Who is the audience for your analysis?
Role: _____________
Be specific. “Head of sales”, not “executive”
Example name: ______________
Name a real person. “Jim Fry”, not “any sales
head”.
Different people want
different things from the
same data.
Given sales data:
• The Board: “Predict next quarter’s sales”
• Product head: “Which product grew the most?”
• Sales head: “Did we meet our target?”
They are not interested in each others’ questions.
Who is your audience? They determine the story
8. DO IT: Write it in this structure
“[Person, Role] is in [situation], and faces this
[problem]. By taking [action], she can drive
[impact].”
Example
Stacy, the Marketing head, person, role
must create a region-wise budget, situation
and doesn’t know the region-wise RoI. problem
By prioritizing the region, action
she can maximize ROI. impact
For each person, answer the following questions:
1. What’s their situation?
2. What problems do they face?
3. What action can they take?
4. What is the impact of this action?
What is their problem? That defines your analysis
9. Here are three examples in real life
9
Purchasing Commodities Cargo Delay Customer Churn
Person, Role Adam, the purchasing head of a
leading European brewery
Cris, the operations head of a
leading US airline
Ravi, the marketing manager of
an Asian telecom company
Situation Had plants that purchased
commodities from several vendors.
Discounts were low. Number of
weekly orders were high.
Had an SLA to deliver cargo from
the flight to the warehouse in under
1.5 hours – 15% lower than their
current best performance.
Found that the cost of replacing
customers was thrice the cost of
retention.
Problem But he didn’t know which plants
and commodities were a problem.
Every plant denied it.
But she didn’t know what were the
biggest drivers of this delay –
people, assets, or type of cargo.
But he didn’t know which
customers to make offers to in
order to retain them.
Action By consolidating vendors and
reducing order frequency,
By adding resources only to the
largest levers of delay,
By predicting which customer was
likely to churn,
Impact They could increase their discounts
and reduce logistics cost.
She could reduce turnaround time
with the lowest spend.
They could tailor a retention offer
and reduce re-acquisition cost.
10. Filter for big, useful, surprising insights
DO IT: Rate each analysis against B.U.S.
Filter the analyses using this checklist
IS THE INSIGHT
BIG
IS THE INSIGHT
USEFUL
IS THE INSIGHT
SURPRISING
We want a result that
substantially changes the
outcome.
Can they take an action that
improves their objective?
What should they do next?
Is it non-obvious?
Does it overturn an existing
belief, or bring consensus?
Example B U S
There are twice as many restaurants in NYC
than any other city
Sales increased in every region except our
largest branch, which dipped by 0.1%
Increase in rainfall increases the sale of
umbrellas, and is the biggest driver of our
sales
11. Here are the analyses & filters for the problems we saw earlier
11
Purchasing Commodities B U S Cargo Delay B U S Customer Churn B U S
The most common commodity
was ordered 10 times a week
across 2.4 vendors
Fragile cargo is a big factor in the
delay, with a 20% impact
B S
Number of inbound calls does
not impact churn.
S
The number of orders is correlated
with the number of vendors.
Reducing one will reduce the other
U
Fridays are when cargo is delayed
the most
Customers who haven’t made
any calls in the last 15 days are
the most likely to churn
B
Plant P126 was the plant with the
most violations, especially on
largest commodity
B U
Trained staff and forklifts impact
delay the most
B U S
Customers making infrequent
calls, recharging small amounts
infrequently, are most at risk
B U S
14. DO IT: Write your takeaway as one sentence
What’s the one thing you want the audience to
remember from your story?
What’s the one message that the audience
should take away?
CHECK IT: Verify these yourself
Is it a single, complete, sentence?
Does it deliver what you want the audience to
remember?
Will your audience care a lot about this?
Close your eyes. Think of a childhood tale.
Summarize the moral of the story in one line
We easily we remember these stories and their
summary as a moral several years later.
Close your eyes. Think of a business
presentation from last week. Can you easily
summarize the message in one line?
Stories are designed around a moral. A single
takeaway. An “elevator pitch”
It’s a one-sentence summary of the most important message for the audience.
Start with the takeaway. Summarize your entire story
14
15. Structure supporting analyses as a tree
15
Example of a business tree
Launch sales were 30% less than target due to
high competition
• Launch sales were projected at $20 mn in the
first month, but achieved only $14 mn
o Sales in every region were 20-50% lower.
o Only Philippines & Korea were on target
• Competitors discounted price by 35% - which
is unsustainable for them
o 80 store discounts increased from 15% to 35%
o The maximum sustainable discount is 20%
• Stores offered higher discounts saw less than
20% of our target sales
Construct a pyramid or tree-like outline
• Start with the takeaway at the root of the tree
• Add a message that supports the takeaway
• Add further details or supporting messages
• Messages must prove the first message, and
only the first message
• Strike off any message that isn’t required to
prove or support the takeaway
• Add next message that supports takeaway
• Add details to prove the second message
• Remaining messages for the takeaway
• Add details as required
Arrange messages hierarchically to prove & support the parent message
16. Here is the storyline for the analyses we saw earlier
16
Purchasing Commodities Cargo Delay Customer Churn
Takeaway Focus on reducing the number of
vendors products ICG (in P126),
FRS (in P121) and SWB (in P074)
for a potential 40% reduction in
logistics & vendor cost.
To reduce the TAT to 1.5 hours at
Airport XYZ, increase the number of
forklifts from 1 to 2, and the number
of trained staff from 4 to 6
If a customer has not called in the
last 5-14 days, and they have
made only 1 recharge under $20
last quarter, make them an offer
to retain them.
Supporting
points
ICG spend is among the highest, at
€6.9m. P126 typically orders 40
times a week, often from 15-20
vendors.
The number of forklifts is the
biggest driver of TAT. Each forklift
typically reduces TAT by 15-30%.
The biggest driver of retention is
when the customer made the
outgoing call. The 5-14 days
bucket has the highest variation.
FRS spend is €3.2m. P121 orders
from 3 vendors 8-14 times a week.
Total staff count does not impact
TAT. Increasing trained staff has a
more tangible impact of ~5-10% per
person.
Customers who make at most 1
recharge under $20 are 280%
more likely to churn than others.
18. European brewery identified €15 m cost savings after consolidating vendors
A leading European brewery’s plants purchased
commodity raw materials from several vendors
each – and had low volume discounts.
Plants also placed multiple orders placed every
week, leading to higher logistics cost.
When plant managers were shown the data, they
objected, saying “That’s not always the case.” Or,
“That’s the only way– no one else does better.”
Gramener built a custom analytics solution that
sourced their SAP order data, automatically
identified which plants ordered which commodities
the most from multiple vendors – and when.
It showed how each plant performed compared to
peers – shaming those with poor performance.
With this, they identified savings of €15 m — which
the plant managers couldn’t refute.
€15 m 40%
savings potential identified
annually
vendor based reduction
identified
18
19. Global airline reduced cargo turnaround time by 15% with scenario modeling
A global airline company took up a service level
agreement to deliver cargo from the flight to the
warehouse in under 1.5 hours. This target was 15%
lower than their current best.
Several factors affect cargo delay across airports.
Availability of forklifts, staff size, cargo type, part
shipment, and many others. Altering any of these is
expensive and takes long.
Gramener built a visual analytics solution that
showed where cargo was delayed. We built an ML
model that identified the drivers of delay (forklifts,
trained staff), and the impact of these on
turnaround time. What-if scenario modelling helped
pick the optimal combination that reduced TAT.
This allowed the airline to reduce the turnaround
time by 15% from 1.76 hrs to 1.5 hrs. The worst-
case turnaround time also reduced by 34% from
2.9 hrs to 1.92 hrs.
15% 34%
cargo turnaround time
reduction (from 1.76 to 1.5 hrs)
reduction in worst-case
turnaround time
19
Evening Morning Night
Fri Mon Sat Sun Thu Tue Wed
FAH N70 RPP TDS ZDH
20-40% 40-60% 60-80% <20% Full
Recovery times are neutral during the evening and morning shifts (mornings are slightly worse), night times are the best.
Recovery times are worst on Fridays, and best on Saturdays & Wednesdays.
Specifically, Friday mornings are particularly bad. So are Thursday mornings.
The FAH product category has the best recovery time, while ZDH is much worse.
However, RPP on Sundays is unusually slow.
Part shipped products tend to perform worse than full-shipments. Specifically the <20% and 40-60% part-shipments.
This is especially problematic for ZDH
Product category
Part shipment
Weekday
Shift
This slide is best viewed in slideshow mode. The animations tell a story that isn’t obvious on the static version.
20. Telecom company saved 66% customer acquisition cost by predicting churn
A national telecom provider had a churn rate of
over 10% a month. Thanks to low switching cost,
their entire customer base churns within a year.
The cost of replacing each customer was thrice the
cost of retention – provided the customers could be
identified with some confidence.
Gramener used the customer profile, transaction
data, payment data, service log data, and other
related information to create a series of
classification models. These predict whether a
customer will churn one month in advance.
The simplest model – the decision tree (shown
alongside) – reduced the cost of attrition by 39%.
A second, more robust machine learning model
increased this to 66%. This model only missed
0.6% of customers and incorrectly spotted only
2.5% of customers.
66% 99.4%
reduction in customer re-
acquisition cost
potential churn customers
correctly identified
20
OUTGOING CALL
N 0 - 4 15+5-14
Y
RECHARGE
AMT > $20
NY
YN
> 1
RECHARGE
N
N Y
3.2% 3.6%
MISSED WASTED
4.0
COST PER CUST.
39%
IMPROVEMENT
Decision Tree
MODELS
21. Pick a format based on how your audience will consume the story
21
22. Pick a visual design based on the takeaway
22
Deviation
Change-
over-Time
Spatial Ranking
Correlation
Part-to-
Whole
Flow
Magnitude
Distribution
23. Annotate to explain & engage. Use four types of narratives
Remember “SEAR”: Summarize, Explain, Annotate, Recommend 23
0
5,000
10,000
15,000
20,000
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
Marks
# students
Teachers add marks to stop some students from failing
This chart shows Class 10 students’ English
marks in Tamil Nadu, India, in 2011. The X-axis
has the mark a student has scored. The Y-axis
has the # of students who scored that mark.
Large number of
students score
exactly 35 marks
Few (but not 0) students
fail at 31-34 marks
What’s unusual
Large number of students
score 35 marks.
Few (but not 0) students score
between 30-35
Only some students get this benefit.
Identify a fair policy that will be applied consistently.
Summarize the visual in its title
Don’t describe the chart.
Don’t write the user’s question.
Write the answer itself. Like a headline.
Explain & interpret the visual
How should the user read it?
What do you say when you talk through it?
Explain what the visual is. Then the axes.
Then its contents. Then the inference.
Recommend an action
How should I act on this?
You need to change the audience.
(Otherwise, you made no difference.)
Annotate essential elements
What should the user focus their eyes on?
Point it out, or highlight it with colors
Interpret what they’re seeing – in words.
This is a bell curve. But the spike at 35 (the mark
at which students pass) is unusual. Teachers
must be adding marks to some of the students
who are likely to fail by a small margin.
No one scores 0-4
marks
24. In summary, here are the 9 steps to go from data to a data story
24
Who is your audience? They determine the story
What is their problem? That defines your analysis
Find the right analysis to solve the problem
Filter for big, useful, surprising insights
Start with the takeaway. Summarize your entire story
Add supporting analyses as a tree
Pick a format based on how your audience will consume the story
Pick a visual design based on the takeaway
Annotate to explain & engage. Use four types of narratives
Instructors: Give the audience 1 minute to write down a one-sentence takeaway. Ask 2 people to read it out. Apply the checklist. If they don’t meet the checklist, prompt them to revise it. Allow them to struggle through it before taking help.
Instructors: Ask 1-2 people from the audience to add supporting points to their takeaway or any message. Ask others to debate whether these points are necessary and sufficient to prove the parent message. Ask the audience if some of them are sub-bullets to a supporting point.