Presented at Solstice 2011 (http://www.isb.edu/solstice/) at ISB on 16 December 2011 as part of Prof. Galit Shmueli's workshop on Visual Analytics (http://www.isb.edu/VisualAnalytics/)
2. Gramener
A data analytics and visualisation company
We handle terabyte-size data via non-traditional analytics and visualise it in real-time.
Gramener visualises Gramener transforms your data into concise dashboards
that make your business problem & solution visually obvious.
your data We help you find insights quickly, based on cognitive research,
and our visualisations guide you towards actionable decisions.
3. WHY VISUALISE?
Consider an Organizational 2010 Bangalore Delhi Hyderabad Mumbai
Sales report shown alongside Month Price Sales Price Sales Price Sales Price Sales
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
It shows performance of 4
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
branches with average price
and sales across 4 cities Mar 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
Each of the branches change May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
prices every month with a Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
corresponding change in the Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
sales value
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
Basic analytics of these Sep 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
numbers reveal consistent Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
performance across 4 Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
branches. Average 9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Further, these sales figures Variance 10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
have a consistent Correlation
and Linear regression across all
cities
4. BECAUSE NUMBERS DON’T TELL THE FULL STORY
Plotting the same data
shows markedly different
behaviour.
Bangalore sales has
generally increased with
price.
Hyderabad has a perfect
increase in sales with price,
except for one aberration.
Delhi, however, shows a
decline in sales as price is
increased beyond a certain
point.
Mumbai sales fluctuated a
lot despite a constant price,
except for one month.
5. DETECTING FRAUD
“
We know meter readings are
incorrect, for various reasons.
We don’t, however, have the
concrete proof we need to start the
process of meter reading
ENERGY UTILITY automation.
Part of our problem is the volume
of data that needs to be analysed.
The other is the inexperience in
tools or analyses to identify such
patterns.
6. This plot shows the frequency of all meter readings from
Why would Apr-2010 to Mar-2011. An unusually large number of
these happen?
readings are aligned with the tariff slab boundaries.
This clearly shows Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
collusion of some form 217 219 200 200 200 200 200 200 200 350 200 200
with the customers. 250 200 200 200 201 200 200 200 250 200 200 150
250 150 150 200 200 200 200 200 200 200 200 150
This happens with specific 150 200 200 200 200 200 200 200 200 200 200 50
customers, not randomly. 200 200 200 150 180 150 50 100 50 70 100 100
Here are such customers’ 100 100 100 100 100 100 100 100 100 100 110 100
100 150 123 123 50 100 50 100 100 100 100 100
meter readings.
0 111 100 100 100 100 100 100 100 100 50 50
0 100 27 100 50 100 100 100 100 100 70 100
If we define the “extent of
1 1 1 100 99 50 100 100 100 100 100 100
fraud” as the percentage
excess of the 100 unit
meter reading, Section Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-11 Mar-11
the value varies Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%
considerably Section 2 66% 92% New section
66% 87% 70% 64% is
… and 63% 50% 58% 38% 41% 54%
manager arrives transferred50%
out
across sections, Section 3 90% 46% 47% 43% 28% 31% 32% 19% 38% 8% 34%
Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%
and time
Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%
Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%
… with some
Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%
explainable Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%
anamolies. Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
7. MONITORING COSTS
“
Our raw material cost varies
considerably across farms, though
we share best practices.
We have over 5,000 farms. The
CONTRACT raw material cost report is a 75-
page Excel report that no one
FARMING reads.
Also, we gain no insights as to how
the productivity changes over time
8.
9. PREDICTING MARKS
What determines a child’s marks?
Do girls score better than boys?
Does the choice of subject matter?
EDUCATION Does the medium of instruction matter?
Does community or religion matter?
Does their birthday matter?
Does the first letter of their name matter?
10. … and peaks
Based on the results of the 20 lakh for Sep-borns
students taking the Class XII exams The marks
at Tamil Nadu over the last 3 years, shoot up for Aug
borns
it appears that the month you were
born in can make a difference of as
much as 120 marks out of 1,200. 120 marks out of
1200 explainable
by month of birth
June borns
score the lowest
An identical pattern was observed in 2009 and 2010…
“It’s simply that in Canada the eligibility
cutoff for age-class hockey is January 1. A
boy who turns ten on January 2, then,
could be playing alongside someone who
doesn’t turn ten until the end of the year—
and at that age, in preadolescence, a
twelve-month gap in age represents an
enormous difference in physical maturity.”
-- Malcolm Gladwell, Outliers … and across districts, gender, subjects, and class X & XII.
11. SECURITIES FINDING PATTERNS
Which securities move together?
How should I diversify?
What should I sell to reduce risk?
What’s a reliable predictor of a security?
12. 68% correlation
between AUD & EUR
Plot of 6 month daily
AUD - EUR values … that move
counter-cyclically to
indices
Block of correlated
currencies
… clustered
hierarchically
13. VISUALISING CHANGE
What was the weather in India like…
EDUCATION
WEATHER THE LAST 100 YEARS?