EXECUTIVE SUMMARY
§ Keep Big Data in view, but some healthy scepticism on the
expectations set by IT vendors is advisable.
§ First ask what data have you got and what can you usefully
do with it?
§ Getting value from data is as much about people and skill
sets as it is about technology.
§ Poor planning leads to complex tools chasing answers to
the wrong questions.
§ Surely there are more useful applications for Big Data than
mining social media sites and targeting on-line advertising?
§ Big Data has a lot to offer in risk management and
strategic planning.
§ A lot of useful data lies in silos and is unstructured and
unused. You need to know what you want and where to
find it before deciding what to do with it.
§ As with any major project, a large scale analytics initiative
needs executive backing and cross-functional consensus
on method, ownership and outcomes.
§ Given the complexity and potential pit-falls of Big Data
initiatives, perhaps it would be wisest to start small with
more manageable data sets.
§ Why not start with some quick wins on something
undeniably valuable, like costs savings, which will not need
highly complex tools and project management resource?
§ High impact quick wins: travel costs and mobile phone
expenses.
Introduction
Much has been written about Big Data and the potential changes technology
will make to how strategy is formulated and business decisions made.
However, not all useful data is big.
Across so many businesses, there is a pressing need to utilise data assets
whilst they are still fresh and useful. Long established organisations in
telecommunications, financial services and utilities have been handling large
data sets that amount to Big Data for decades, although processed
somewhat slower than is possible today. Newer businesses, such as social
media and life sciences, are applying new analytics capabilities to vast and
ever increasing volumes of data. Also, in our increasingly monitored society,
evidentiary technologies such as video capture and call recording generate
volumes of data and increasingly sophisticated analytical tools.
Data analytics has become a core competence for businesses. Insights from
data are now a key source of competitive advantage. Big Data comes into
play when the data sets get too big for standard data mining tools to handle
and where there is a need for a more real time and predictive approach.
However, for most businesses, old fashioned tools such as relational
databases are still widely in use and there is a familiarity with data analytics
within most organisational skills sets. The need for large scale data analytics
is not yet widely seen as a priority issue. Big Data is still usually seen as an
emerging technology with a set of benefits to keep an eye on but more with a
view to how it could be put to practical use in future.
Whilst this development and learning around the use and capabilities of Big
Data continues, it would be useful for businesses to apply their existing
analytics skills to shorter term, practical issues that use smaller data sets.
Businesses continually generate data and much of it gets lost or orphaned in
silos when it could be usefully consolidated, structured and processed.
There will always be a need for human intelligence, intuition and know-how in
the use of data analytics. Decisions on strategy, cost reduction, supplier
management, fraud, regulatory compliance and tax management all need a
combination of human intuition and insights drawn from the hard facts of data
analysis to know what to look for, where to find it and assess false positives.
For these very defined issues, employees usually know how things could be
improved or money saved. Since improvements in such areas generate a
high impact in a short time with the least complexity, this paper advises
businesses to consider focusing efforts on the smaller things that have a
more assured return on investment before taking on complex Big Data
initiatives.
About Big Data
i. Data Volumes, People and Technology
According to the recent IDC Digital Universe paper, the world's data is
doubling every two years with 1.8 trillion gigabytes created in 2011. The
volumes, speed and variety of data are rapidly changing and with that the
tools with which data can be processed. Just like a newspaper, data has a
shelf life and its value goes down over time. It is an asset that is created by
people or machines taking actions and leaving traces of what they did. It
grows exponentially and across the organisation, which is why it is so
difficult to keep up with the information assets of today, let alone be ready to
maximise value from the data sources of tomorrow.
Making a data analytics initiative successful is as much about having the
right people as it is having the right data and technologies. The challenge is
to use huge data resources quickly enough to catch the value in the
moment. A lot of focus and clarity on what the initiative is suppose to
achieve is needed before any implementation project takes place.
Otherwise the outcome of a project can end up as a system with
inappropriate rules, chasing answers to the wrong sort of questions. Sadly
this seems to happen quite often.
To get things right there is a need for the right people at all stages of the
project. If the skills to generate insight from data and act on it are not yet
strong enough in-house, it would be prudent to look externally for suitable
consultants who can bring a perspective based on wider experience.
Data sets over 10TB are typically classified as “Big Data”. In the past, this
has meant that they are so large that it has been difficult to process them
with standard tools and in-house resource. For example, a global
telecommunications company may collect billions of detailed call records
per day from 120 different systems and store each for at least nine months.
It used to be slow to run billing cycles and trend analysis to get insights from
large data sets. Not any more.
Processing bulk data has been made possible by technologies such as
Apacheʼs Hadoop and the many proprietary alternative products such as
EMC Greenplum, MapReduce, HP Vertica, IBM Smart Analytics/Netezza,
Sybase IQ and Teradata. Also, storage costs have become a lot cheaper.
These changes have finally made large scale data mining practical and
affordable.
It seems that these exciting new technologies are being mainly used for
social media sites, shopping sites and the need for targeted advertising to
sell more “stuff”. Surely powerful and game changing data analytic tools can
be applied more usefully to improve the way businesses are run and have a
positive impact on whole economies.
ii. Governance
One very useful application for Big Data is governance, especially in
industries like Financial Services where the problems are systemic and
seemingly embedded in the culture of doing business.
After the wave of Sarbannes-Oxley work in the early 2000s, many Internal
Audit departments have been contracting. There is an opportunity for them
to take on a new and more strategic role by using data analytics to improve
agility around managing the enterprise risk profile. Engaging independent
consultants can be very useful in getting past internal barriers and achieve
the identification, collation and quality checking of key data sets. Specialist
assistance can also be useful in helping to define the rules and algorithms
that need to be applied in order to get the desired outcomes. Then there is a
need for consensus on what truths the data is indicating and how processes
and policies need to be changed to better address the evolving risk profile.
Coordinating these sorts of projects with both in-house and external
expertise leverages the existing skills of Internal Audit in new ways.
Data analytic systems can generate outputs that will hopefully lead to more
informed decision making. In order for enterprise risk management to be
effective, the outputs need to be made readily available to various risk
committees who can then evaluate risks, impact and likelihoods in a more
fact based, less subjective way. A common problem is where companies try
to apply a Business Process Re-Engineering (BPR) or “lean” approach to
managing risk.
Whilst the BPR/lean approach works well for automated systems such as
factory floors, it does tend to eliminate creativity and flexibility. The input for
decision points is not just the data itself, but also how human brains assess
data, trends, gut feelings and come to consensus before decisions are
made.
When applying analytics to governance, a more knowledge-based process
is needed. The value of all these new insights from data comes together at
the decision points where one or more people can ask the simple question
“why?” BPR/lean was really designed for managing a flow of actions and
decisions made by machines, not by people.
The risk function can become a source of business insight and consensus
building when it has clarity on what is really going on. Big Data enables this
and hopefully organisations with known and serious governance problems
will give Big Data initiatives a high priority.
Problems with Big Data
i. Silos of Data
In a recent analysis from IBM and Oxford University, it was found that out of
over 1,000 businesses surveyed, less than 30% had even only started to
adopt Big Data. Out of those who had, less than 6% had actually got
beyond the pilot in order to see actual real life Big Data initiatives come into
use (see “IBM 2012 Analytics Study: The real-world use of big data”).
Changing an organisation so that its data assets are kept fresh,
consolidated and available for suitably skilled professionals to use is in itself
a major project. One of the most time consuming parts is actually finding the
data and extracting it from different systems, both inside and outside
firewalls. There are also issues of security, privacy and data protection that
need to be addressed.
Getting data out of silos and into a structure and a format that makes it
useable is also a challenge. Automated tools are now being used to create
master data repositories across disparate enterprises and silos to quickly
derive data and also combine various data sources so that they use
consistent classes and properties.
The challenge is to get quality data that takes a business away from silos
and into a single version of the truth. This, in turn, can lead to higher data
quality and faster deployments of synchronised Business Intelligence
scorecards and dashboards across organisations.
ii. Data Structure
Another issue is that data usually comes in multiple types and formats. To
make it even more difficult, much of the explosion in data volumes can be
attributed to large amounts of "unstructured" data. This does not fit into a
traditional data model (i.e. database, data warehouse, etc.) but is
represented by text, server logs and web based data. To translate these
into an analytics framework, the trend is to develop hybrid data structures
that leverage both traditional Business Intelligence tools for structured data
(e.g. relational databases), in addition to new technologies that allow for
streaming very large quantities of unstructured data in ways that enable
businesses to data mine the information for analytical purposes.
Businesses often seek to convert data into a more structured format to
augment existing Business Intelligence and analytic methods. Much of the
hype around "Big Data" is in reference to this approach and as with all hype
there will be both an element of truth and possibly a lot of unfulfilled
expectations.
iii. Quality in Data and Results
Data quality is a significant issue. Finding high-quality data can become
expensive, time-consuming and frustrating. There is never going to be a
perfect data set, so at some point someone needs to decide whether or not
a data set is good enough to use. There will always be a need for validation
checks to identify duplicates, outliers or orphan records. Practicality comes
with the experience and know-how that says when to draw the line and
accept or reject a data set.
When data analysis is completed, the main issue is false positives which
can lead to wrong conclusions and bad decisions. Knowing when a set of
data is making sense or not is a people skill and requires knowledge of both
the data and the context in which it is being used.
iv. People & Terminology
Since Big Data initiatives are so complex, they need a lot of consensus and
commitment from management in order to be successful. As with all
initiatives, people are going to be the ultimate drivers of success or failure.
A lack of proper executive sponsorship is probably the leading problem for
developing and sustaining practical use of analytics in any business. It can
be useful to get external advisors to facilitate workshops in order to
establish priorities and write up action plans, which removes the potential
bias of politics and dominant personalities within organisations.
A lack of consensus around specifications, objectives and definitions at the
working level is probably the biggest cause of project failure. When
definitions of requirements are not aligned with business strategy or a clear
list of required outcomes, the end result will ultimately not be right for the
business.
The skill sets and expertise needed for data analytics are not the same as
the traditional IT skill sets. This can lead to potential internal conflicts and
politics between the people who define the specification and algorithms and
those who need to assemble the systems to make things work.
As with any conflict, there is value in having a mediator; a neutral third party
who can facilitate consensus around key issues to prevent people problems
from derailing projects. This sort of external consulting and facilitation can
lead to the creation of analytics competency centres, closely involving
Internal Audit departments, in a structure that sits between IT and the
business users.
Analytics is too embedded with business context and process to be
managed and controlled as constrictively as most traditional IT applications.
Organisations that can find a balance of IT control (standard architecture)
and business definition and use (governance) would probably benefit the
most from Big Data analytics.
iv. Underlying Assumptions
Another issue with Big Data is that it is too often presented by big IT
vendors as being the answer to everything. Big Data is not the beginning
and the end of all forms of data analysis. It is a far higher level of capability
and complexity over and above more established data mining tools.
As with most data mining, there is the underlying assumption that historical
data can predict future results to a high level of accuracy. Sometimes if not
often this is true. However, exceptions and coincidences should be
expected. If there is a change to the environment that generates past
results, then that data set and its algorithms are unlikely to be valid for
further predictive analytic tasks. More and complex algorithms could be put
in place to anticipate change, but their success will ultimately come down to
the assumptions upon which they are based. There is always the risk that
an analytical system will no longer represent its data set.
Big Data is without doubt a phenomenon. Equally certain is that we are in
the very early stages of its evolution. So rather than getting too excited by
the promise of Big Data as evangelised by big vendors, it is probably best to
get clarity and consensus on what each Big Data initiative is supposed to
achieve and at what point it would justify the time, resource and expense
required. Then plan the project, investigate the technology and define any
need for highly skilled data scientists.
The early stage of any large scale data analysis project is best spent
focusing on the people who will have to use the outputs of Big Data and the
people putting in place the infrastructure to enable it. For this reason it is
advisable to bring in a neutral third party to facilitate consensus and the
action plans.
Back to Business Intelligence
i. Smaller Scale Data Analytics Projects
Whilst Big Data generates a lot of media attention, organisations that still
continue to mine various data sets often do so with clear goals and a
successful outcome. These smaller projects are usually referred to as
“Business Intelligence” or “Business Analytics”, with the two terms being
used interchangeably. For the sake of simplicity, we refer to Business
Analytics as having bigger data sets and a lot more complexity in its
algorithms. What we here refer to as Business Intelligence (BI) uses less
powerful tools, usually with in-house skill sets and far more familiar data
mining techniques.
The BI approach is very powerful when applied to smaller, targeted issues
where insights from data analysis can lead to recommendations and quickly
yield a positive return. Processes are similar to habits, they evolve over time
and can be hard to change. However, if something has always been done in
a certain way does not automatically mean it ever made sense.
Employees are usually aware of systems and processes that waste their
time, cost the business and make it difficult to get their jobs done. They can
also come up with good ideas for improving things given access to the right
data analysis, which BI is well suited to deliver.
Supplier management is an example of a high yield place to start when
asking what common tasks and activities waste time, cost money and could
be improved through better insights from data. The data sets used in such
work are typically quite easy to find: supplier contracts, billing data and
inventories. Linking these together in relational databases makes it possible
to query, analyse trends, find anomalies and identify opportunities to save
money or claim rebates. This sort of project is typically small but can have a
big impact.
ii. Data Analytics Applied to Travel Costs
To illustrate the value of old fashioned BI projects to businesses, even with
all this focus on Big Data, look at travel costs. Travel costs present a good
example of where data analytics can answer questions that lead to
decisions that in turn save money.
Most large companies will likely have a centralised booking system for
flights and hotels provided by one or more travel agents. These travel
suppliers have probably won their contracts based on volume related
discounting and contractually defined service levels.
What questions can be asked of all the historical booking data to find
potential cost savings? The first questions that come to mind would be:
1. Are all travel agents charging correctly based on volumes or are
rebates due?
2. Is one travel agent being underused, thereby missing the opportunity
to qualify for better discounted pricing?
3. Are the selections of hotels and flights being made on the right
criteria or are additional inputs required such as user ratings, location
etc?
Once the contract terms are coded as rules through which all the booking
data can be viewed, it becomes relatively straightforward to create a
continuous audit. Perhaps other data sets need to be added to the system
over time such as user reviews, safety records etc.
Once travel has been addressed and the opportunity for cost savings
proven, other cost areas also present themselves for the same treatment.
ii. Analytics Applied to Mobile Phone Costs
Mobile phone costs are likely to be a major source of potential cost savings.
The supplier contracts are complex, there is a lack of regulatory insight over
roaming, billing has errors and data hungry applications are becoming more
and more common. In addition, employees can see their mobile phone as a
perk and there are often lax attitudes toward what would be reasonable and
responsible usage.
Also, the technology is fast changing and the IT industry keeps coming up
with new products and solutions, some of which live up to the hype. There
has been a lot of industry hype recently around Bring Your Own Device
(BYOD). This is rarely defined and it could apply to the employee buying
their own mobile device and paying their own costs or it could also be about
the security and partitioning of mobile devices. However, for most
businesses, the possible productivity benefits of BYOD apply usefully to
laptops and tablets rather than mobiles. For mobile management, BYOD
has again been presented by the IT industry as an answer to everything and
taken too much attention away from real solutions to tangible problems.
Both the payments to suppliers and the expense management of
employees present opportunities to save time and money. As well as
managing mobile supplier costs, there is also the issue of how employees
claim their expenses for mobiles and separate their personal usage from
company costs.
What many companies have been doing for years is getting employees to
go through their mobile phone bills with a yellow marker pen to manually
separate out the business from the personal calls. This is time consuming
and not necessarily accurate. However, it could be seen as a simple BYOD
policy.
The core data sets are the mobile carrier contracts, the mobile carrier billing
data and device inventories. Also, there are probably additional data sets
that could be included such as data and system logs from the mobile
devices, applications and any mobile device management systems (e.g.
BlackBerry Enterprise Server). More and more data sets can be included as
more questions need answering. Perhaps the most obvious questions that a
business would want its mobile data to answer are:
1. How can technology simplify and streamline the mobile expense
management process and claim back hours of time employees
spend with yellow marker pens, bills and forms?
2. Are contract rates actually being applied or are rebates due?
3. Are employees using their mobiles in a fair and cost efficient way?
As with the example of travel costs above, the technology tool would be a BI
database structure into which the different data sets are absorbed and a
graphical user interface developed. This GUI would make it easy to spot
trends, flag risks and extract data for external analysis and reporting.
As an estimate of potential cost savings, a company with over 100 mobiles
and a fair degree of mobile roaming, should be able to use data analytics
tools to reduce current costs by up to 20%.
In both these examples of travel and mobile phone costs, the typical
problems of Big Data do not really apply. They use smaller data sets,
simpler tools and much of the expertise in configuring and managing the
projects is in-house. The value of external expertise would be in developing
regular cost savings reports, facilitating agreement on implementation of
recommendations and drawing on a wider experience base of where to find
cost savings.
Conclusion
Big Data will have a crucial part to play in gaining competitive advantage but
in order to make it work as a useful system and be able to action its
findings, it needs the right people with the right skill sets.
Since the level of complexity around Big Data is so high, we recommend
putting more focus on old fashioned BI and applying these familiar tools to
old problems. Big Data is the next big step in data analytics, but it is not the
answer to everything.
For more information please contact: john_enoch@ymail.com