Invited talk of Daragh O'Brien, Managing Director of Castlebridge Associates, at the European Data Forum 2013, 9 April 2013 in Dublin, Ireland: The Story of Maturity – How data in Business needs to pass the ‘So What’ tests
3. Ancient Sumeria
• Written in Accadian
• Used pictographic representations of information and concepts baked/carved
into tablets made of clay (high sand content)
4. Filing: The Birth of Big Data
Image by Nic McPhee @ commons.wikimedia.com
5. Physical Data (5925 years approx.)
6 thousand years
Tablets Tablets
Electronic Data
(c.75 years)
• More Information processed
• Information processed faster
• More ‘self service’ data processing
• Changed expectations of data and
processing.
10. Where is Big Data?
Certainty
Wisdom Optimising
Enlightenment Managed
Awakening Defined
Repeatable
Uncertainty
Initial
(Overlaying Crosby CMM model with DMBOK Maturity model)
11. Where is Big Data?
Certainty
Wisdom Optimising
Enlightenment Managed
Awakening Defined
Repeatable
Uncertainty
Initial
12. Maturity: Answering So What Questions
So What…
…is it?
…problems will it solve?
…will we be able to differently?
… legal / regulatory risks does all this pose?
… do we need to do to tap this gold mine?
… are we not doing today that this will enable?
… are we not doing today that this make worse?
14. Organisations don‟t manage data well
Information Governance / Data
Governance only now emerging as
formal disciplines
Information Quality / Data Quality also
only beginning to be coherently tackled
in many organisations
Phone companies still get bills wrong
Data Protection breaches still occur
• Note – this is more than just SECURITY
breaches
Data Migrations, CRM, ERP still fail
Metadata largely under-managed
15. Bottom Line Impact
% of Risk Managers who see Information as
Deloitte 88%
“Significant” in their Risk Management plans
% Data Migrations that FAIL (don‟t deliver, over 84%
Bloor
run time/budget, deliver reduced functionality)
% of Chief Financial Officers who see Information
Forrester
Management as a barrier to achieving Business goals
75%
Estimated % of TURNOVER wasted by
Gartner 35%
companies due to poor information quality
Time lost to organisations from staff 30%
IBM rechecking information
This is when dealing with “traditional” structured/semi-structured data..
17. “So far, for 50 years, the information revolution has centered on
data—their collection, storage, transmission, analysis, and
presentation. It has centered on the "T" in IT.
The next information revolution asks, what is the MEANING of
information, and what is its PURPOSE?”
Peter Drucker, Forbes ASAP, August 1998
24. The Pending Orders Solution 2006
Elite Specialist Information Quality Agent
Licensed to “Fix the Data by all means necessary”
(firearms not actually used…)
25. The Pending Orders Solution 2006
Orders for could have
Orders for infrastructure
multiple dependent
had engineering statuses
products – double counted
Revenue Assurance did not Dependencies between
look at all relevant data process steps not
sources understood
26. The Pending Order Solution 2006
There wasn‟t a Crisis situation • External Factors affected
order completion times
• Intra-order product
dependencies lead to
Revenue double counting
• Context of the process was
Assurance important
Hypothesis was
flawed
29. Question 1: So What Data Do We Need?
No doubt that more data
helps, but don‟t for a minute think
that you need all data to make an
informed business decision.
Organizations that are effectively
leveraging the power of Big Data
realize that they will never
capture all relevant information.
Phil Simon
To Big To Ignore: The Business Case for Big Data
31. Question 1: So What Data Do We Need?
What is the problem we are trying to solve?
What is the Process Context for this problem?
What is the “Information Environment” for this problem?
32. The Pending Orders Crisis
What is the problem we are trying to solve?
• Customers are not being billed for services they have
• Revenue from services is not being realised
• We have orders that are not being completed
What is the Process Context for this problem?
What is the “Information Environment” for this problem?
33. Question 1: So What Data Do We Need?
To properly answer this question you need to have:
A PLAN
34. Question 2: So What is Stopping us doing it?
• Data Protection Rules
Regulation: • Industry Regulations re: Data Governance
• Legacy architecture
Technology: • Technology Management (Silos)
Human Factors: • Skills (technical/problem solving/analytical
• Political (Change Management)
35. Question 2: So What is Stopping us doing it?
• Quality of internal data
Data: • Completeness, consistency, “transactability”
• Ability to link external data to internal data
• Governance of data
• Decision rights
• Supplier relationship management
• Roles & Responsibilities
36. Example of Regulation
Location Data
Use of Location Data in Telecommunications is affected by EU Data Protection rules
Consent is required for it to be used for “Value Adding” services
37. Data Quality
I am incredibly sceptical about claims that “Big
Data” is immune to Data Quality problems.
Statistically, Data Quality errors will skew your
mean, and create outliers that affect your
analysis.
While “Big Data” might not be as prone to „fat
finger‟ errors, you still have to consider whether
the mechanisms gathering the data are correctly
calibrated and the algorithms for analysis are
running correctly or whether you have
measurement errors you don‟t know about.
Dr Thomas C Redman, thought leader in Data Quality
40. Bias within the Data?
The greatest number of tweets about Sandy came from
Manhattan. This makes sense given the city's high level of
smartphone ownership and Twitter use, but it creates the
illusion that Manhattan was the hub of the disaster. Very
few messages originated from more severely affected
locations, such as Breezy Point, Coney Island and
Rockaway. As extended power blackouts drained batteries
and limited cellular access, even fewer tweets came from
the worst hit areas.
Kate Crawford Hidden Biases in Big Data, HBR 1st April 2013
Tom gives the example of his early work in telecoms billing data. The emphasis was on the sample bias quality but the actual measurement error in the process – the data quality issues – where an order of magnitude greater than the errors due to the sample bias.