In the age of "big data," organizations need a business information model that organizes and partitions information in new ways that is useful to how businesses operate today.
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Surviving the Petabyte Age: A Practitioner's Guide
1. • Cognizant 20-20 Insights
Surviving the Petabyte Age:
A Practitioner’s Guide
Executive Summary The amount of time it takes for news to become
common knowledge has shrunk, thanks to:
The concept of “big data ” is gaining attention
1
across industries and the globe. Among the drivers • An emerging network of social media and blogs
are the growth in social media (Twitter, Facebook, that potentially makes everyone a publisher of
blogs, etc.) and the explosion of rich content from good and bad news.
other information sources (activity logs from the
Web, proximity and wireless sources, etc.). The
• A rapid increase in the number of people who
are untethered from traditional information
desire to create actionable insights from ever-
receptacles and now have a highly mobile
increasing volumes of unstructured and struc-
means of collecting and ingesting information.
tured data sets is forcing enterprises to rethink
their approach to big data, particularly as tradi- • The meteoric rise of desktop tools housing a
tional approaches have proved difficult, if even significant portion of information. Organiza-
possible, to apply to structured data sets. tions need to understand the information and
processes involved in the dispensation of desk-
One challenge that many, if not most, enter- top-managed information (mostly Microsoft
prises are attempting to address is the increas- Access and Excel). This information is most
ing number of data sources made available for likely to be found in the form of:
analysis and reporting. Those who have taken an
early adopter stance and integrated non-tabular
> Copies of operational data (including both
sources and targets).
information (a.k.a. unstructured data) into their
pool of analysis data have exacerbated their data > Copies of operational data that is enriched
management problems. (including the processes and sources used
for enrichment, as well as the targets that
A second challenge is the shrinking timeframe in receive the enriched information).
which a business stays focused on a particular
topic. Thanks to the highly integrated and com-
> Processes bypassing the systematized pro-
cesses (including the bypassed processes,
municative global economy, and the great strides
the sources used for these processes, the
made in expanding communications bandwidth,
actors in these processes and the results of
both good and bad news circumnavigate the
these processes).
globe at a much faster pace than ever before.
cognizant 20-20 insights | december 2011
2. This whitepaper lays out the concept of a business tion models cannot be maintained fast enough to
information model as a vehicle to organize infor- appease their business constituents. Moreover,
mation into separate categories, which directly once constructed and populated with information,
influences the creation, capture or extraction of these models require new technologies to inter-
business value and elevates it to a heightened face with the data. Adding insult to injury, all this
focus. We will cover four main topics: data is largely introspective and serves merely to
support the status quo. When disruptions occur,
1. Why companies dealing with big data in insights can only be gleaned from this data over
today’s Petabye Age1 need to stratify informa- a sufficient passage of time; in the meantime,
tion so that trustworthy, relevant, actionable insights are derived from what is largely called
and timely data can be found at a moment’s unstructured and semi-structured data, as well as
notice. data obtained from outside the organization via
2. A business model that can be used to stratify social media, blogs, Web sites and a host of other
information. sources that don’t fit into the neatly organized
tools devised for insight generation.
3. A new definition of partitioning and a business
process for formulating the partitions. A major shift is transforming the basic tenets of
Partitions should deal with stratifying informa- data-driven insight generation. This shift requires
tion based on its contribution to organizational a new way of combining and synthesizing data
data, as well as the more traditional technical used for navigating the highly integrated and
partitioning that is conducted for performance communicative global economy.
and maintenance reasons.
Overcoming this challenge requires organizations
4. Methods of rolling out an information infra-
to solve three important issues (see Figure 1):
structure aligned with this new partitioning
definition. The realities of this new environ- • Data depth: How to derive insight from struc-
ment are that the maintenance of a traditional tures that contain billions or more instances of
enterprise information model happens at the data. These can include sessions in a Web log,
speed of business and is in direct opposition entries obtained from social media, entries from
to maintaining the focus of information that RFID activities or mobile-sourced activities. One
directly contributes to enterprise value. thing is sure: The sheer size of these pools of
data will continue to grow, resulting in techni-
Three Issues to Solve cal hurdles that challenge traditional methods
The Petabyte Age2 is creating a multitude of for efficiently and effectively using such large
challenges for IT organizations, as they find that pools of like data. Most solutions that deal with
their well-honed, carefully constructed informa- big data attempt to meet this challenge.
Data Challenges of the Petabyte Age
Figure 1
cognizant 20-20 insights 2
3. • Focus on enterprise value: How to quickly Sheer Depth of Similar Data
determine which data requires the most focus
Specialized tools have emerged to address this
at any point in time. Thanks to our tightly
issue of enormous pools of similar data. These
connected global economy, news travels
tools originate from the realization that the time-
around the world more quickly than ever,
honored structured query language tools, as well
which requires rapid rethinking of enterprise
as other tools built around database technologies,
strategies and tactics. This requires the ability
are ill-equipped to efficiently deal with billions,
to quickly change which data is focused upon.
if not trillions, of rows of data. Spawned from
Traditional information models that are con-
Google’s attempt to deal with the data accumu-
structed to synthesize business knowledge
lated from all the interactions that occur with the
from the deluge of available data impede the
Google software suite, a whole new framework
nimbleness required to meet the needs of the
built around the MapReduce technology has been
modern-day enterprise.
borne, and an emerging suite of tools has begun
• Less introspective view: How to make the to appear on this new stack of technologies.
whole information fabric less introspective.
Using information derived from inside the There will no doubt be a refinement of the tech-
organization can predict future trajectories niques that are maturing to deal with this concept
only if the status quo is assumed. However, of big data. The only thing we can be sure of is
when there is a high degree of turbulence, that the big-data business issues addressed by
knowledge obtained from internally-generat- MapReduce and the related suite of technologies
ed information is woefully inadequate in the are not going away.
short term; insights are obtainable only after
Just as the technologies available for launching
sufficient time has passed and several cycles
the initial collection of Web sites were immature,
have been interpreted. The resulting organi-
so are the tools for developing solutions for big
zational missteps are covered regularly in the
data. Much has been said about how technology
news media. What is required is an ability to
has taken a major step back from what is com-
wield information as an early-warning system
monly available for business intelligence and data
for understanding changes in enterprise tra-
warehousing solutions — but this is much less a
jectories. Such data sources are external to
statement about the problem of big data than it
the enterprise until enough time has passed
is about the immaturity of the technologies avail-
for a history of data points to be inferred from
able for solving the big-data problem set.
internal data.
Converting Big Data Into Value
Relevant Actionable
Trustworthy
Acquired & Learned
Created Knowledge
Data Inference
Just-in-
Focused Time
Capabilities Customers Markets
Channels Value
Risks
Investors
Chain Insight Regulatory Expected
Disruptions Outcomes
Heard
Inference Action Innovation
Extracted Originated
Value Value Value
Captured Captured
Transaction Captured Value Value Stream
Figure 2
cognizant 20-20 insights 3
4. Managing Opportunity and Risk
Managing
n Operational
tio Risk Ac
ra ti
bo People Capabilitieso
Techn
ns
ll a
Customers olo
Co
gy
ABLER
Media N Competitors
S
S
S
S
S
S
E
Diffusing Focused Enhancing
Disruptive Information Sustainable
Events Value
Markets Geographies
Pro Financing
tri
cs
duc Me
Innto
Re
Process h
n
ul vation
ai
g
at C
or Defining e
s
Enterprise Valu
Strategies
Figure 3
Interestingly, the problem of large pools of data nal and external sources), learned inferences,
is the primary issue, which today is tackled by heard inferences and innovations, some of which
introducing technologies to tackle each of the will serve as disruptions to others in the partici-
challenges outlined above independently. Com- pating marketplaces.
panies that thrive in the Petabyte Age will be
able to consolidate the technologies so their busi- It is the business model itself that must provide
ness constituency is faced with a single interface the focus into what is pertinent to the business
that addresses their full complement of informa- at a particular point in time and that serves as
tional needs. the point of contention. The enterprise busi-
ness models used as the basis for synthesizing
Focus on Influencers information as the means of gaining insight are
of Enterprise Value devised to map all data rather than “tiering” data
The intent of business intelligence is to take into focus areas. Examples of focus areas include
actionable, relevant, trustworthy and timely data; the following:
put it through a model that aligns with key busi-
• Directly relates to creating or protecting
ness challenges (customers, extracted, originated or captured enterprise
To create or protect geographies, channels, inves- value.
enterprise value, the tors, markets, etc.) as the means
to gain insight; and derive an • Does not directly contribute to value but is
information deemed action plan to extract, originate mandatory for business operations.
worthy of focus must or capture organizational value • May not be mandatory for business operations
be sufficiently broad (see Figure 2, captured page).
Furthermore,
previous
value
but is mandatory for regulatory purposes.
in scope so that both can be a one-time event (i.e., a • May not be mandatory for the above categories
but is mandatory for archiving.
the opportunities and temporary supply shortfall of
risks are exposed in a competitor) or a permanent • Was once important but is now relegated to
value stream. While captured historical trivia.
all dimensions of the transactions are acceptable,
To create or protect extracted, originated or cap-
business model. captured value streams are tured enterprise value, the information deemed
more desirable.
worthy of focus must be sufficiently broad in
Data is converted into insight by using acquired scope so that both the opportunities and risks are
and created knowledge (obtained from both inter- exposed in all dimensions of the business model.
cognizant 20-20 insights 4
5. For example, in the illustrated business model in at which point it is much more difficult to
Figure 3 (see previous page), operational risks, remediate.
disruptive events, enterprise strategies and
Disruptions make themselves known through
sustainable value sources will be managed by
external data much more readily than internal
managing:
data. However, there are also problems with exter-
• People, as well as the services they provide. nal data, including the fact that this data is much
• Processes and the metrics used to manage the more loosely defined and that the sheer number
processes. of information sources are more extensive and
change more frequently in scope and content.
• Innovations — specifically, the products
released into the marketplace. An example of an external data source that can be
• Capabilities aligned with technologies. captured is Twitter. All Twitter content is capable
of being captured, and a competitor’s promotion
Information will be managed in this model, along that is broadcast on Twitter can be immediately
the following dimensions (i.e., the enablers): exposed. In order to listen for a Twitter message,
however, a handful of literally billions of 140-byte
• Customers, or the customers, prospects and
messages will be the potential source of this infor-
visitors who can be tapped for enterprise
value. mation. And Twitter is only one of many informa-
tion sources that can expose such calls to action.
• Media, both traditional and emerging (social
media like Facebook and Google+) that can Early warning systems are not a new phenomenon.
influence enterprise value. Just as those that are deployed for catastrophic
weather and natural disasters, early warning
• Markets participated in for originating,
systems for businesses should be launched to
extracting or capturing enterprise value.
warn of disruptions to the orderly management
• Financing, or the source of funds used for of the strategies and tactics of enterprises that
investments and cash flow used to originate, ultimately extract, originate or capture value.
extract or capture enterprise value.
Integrating this information into a meaning-
• Geographies and sovereign nations from which
ful early warning system requires a new way of
enterprise value will be originated, extracted
examining information. In the Petabyte Age of
or captured.
ubiquitous and proliferating data, the integration
• Rivals in markets and geographies that of information must be done immediately, or else
compete for customers, market coverage and the value of such integration is worth significantly
funding sources. less than when it was initially exposed.
A Less Introspective View Several years ago, computer scientists discovered
of Information that code was more nimble if it was decoupled
Only expected trends can be tracked using inter- from its underlying model, which gave rise to the
nal information. Disruptions will eventually appear SOA and REST architectures; similarly, a process
in internal data, but their trajectory will only be can decouple the modeling of data from the
evident after two or more cycles of information ability to publish alerts, dashboards and access to
make their way into the internal data stream. This consumers. This post-discovery means of utiliz-
means: ing data has been written about by Forrester and
others and is the basis of many advanced tools
• It will take a minimum of three days for new in the marketplace today. The reason for such an
sales trajectories to make themselves known to approach is to discover anomalies prior to the
a daily sales system. By that time, any progress normal publication cycle.
that competitors have made in capturing value
from your largest customers is removed for A number of technical solutions are emerging to
immediate transactions (i.e., captured trans- deal with publishing data at a moment’s notice.
actional value) and, in many cases, is gone Most of these solutions are covered under the
forever (i.e., captured value streams). topic of “virtualized data warehouses,” which will
be covered in a separate whitepaper. Momentum
• In cases where data is reported less frequently,
for virtualized warehouse technology has picked
such as financial results, it will take weeks or
up, as all vendors in the space have positioned
months for such situations to be exposed,
themselves to offer “perfect solutions.”
cognizant 20-20 insights 5
6. Stages of Information Management
The EIS/DSS Age The BI/DW Age The NextGen Age
(circa 1975-1997) (circa 1993-2013) (circa 2010-?)
Issues that were tackled: Issues that were tackled:
• Elimination of paper • Single version of the truth
• Improvements in monitored data • Terabytes of information
• Information responsiveness • Performance constraints
• Gigabytes of information • Governance models
• Delivery models (PCs, Windows) • Specialized tools
• Support costs • Delivery models (Web, etc.)
Issues that must be tackled:
• Just-in-time information
• Always-on prioritized information
• Less introspective information
• Petabytes of information
• Source integration timing
• Governance and valuation models
• Component-based delivery models
Figure 4
A Framework for the Petabyte Age available elsewhere rarely comes in neat
bundles of tables that are easily integrated
Roughly every 15 to 20 years, the disciplines of
using readily available scripts.
delivering enterprise information for creating
business-critical insight and improving the overall • The ability to integrate new sources of infor-
decision-making process undergo radical change mation at a moment’s notice. This requirement
(see Figure 4). We are in the midst of such a major challenges the basic tenets of the enterprise
shift. These cycles tend to share the following information model and ETL processes that
characteristics: have matured over the past 20 years.
• They are ushered in with the availability of • The ability to embrace changes (i.e.,
tools that are greatly reduced in price or additions and deletions to the information
are open source and displace much of the fabric used to steer, organize and ultimately
functionality of the products being replaced produce enterprise value by proving that
(e.g., in the late ‘90’s, such products like Pilot the technology arm can responsively deliver
and Comshare were displaced by market trustworthy information). Disciplines such as
upstarts like Javelin and Excel). process governance, data governance, infor-
mation centers of excellence that manage
• There are referenceable cases of enterprises
a catalog of components and information
that have successfully utilized next-generation
lifecycle management3 are enjoying renewed
solutions for translating raw data into insight.
popularity because they are cornerstones of
Challenges that must be tackled as part of this this renewed responsiveness to the knowledge
next-generation age are: worker community.
• The ability to deliver prioritized, just-in-time What is important in the new disciplines associ-
information through an always-on interface ated with insight generation is that they are cen-
(i.e., mobile). tered on focusing on information, whether or
• The ability to combine information generated not it is traditional, internally sourced informa-
inside the organization (introspective) with tion. Many of the information sources will require
information made available elsewhere. It is techniques associated with big data (billion-plus
important to note that information made row tables), but all of it will require assistance in
cognizant 20-20 insights 6
7. focusing on the information dilemma for the for- > Available in official operational systems.
seeable future (i.e., finding which information is
critical for a specific business need is much akin
> Available from unofficial operational sys-
tems (normally Microsoft Access and Excel).
to finding the proverbial needle in a haystack).
> Introspective but document-centric
Much work has been done to create an infor- information (contracts, e-mail, etc.).
mation lifecycle for managing performance of
analytical and operational systems. However, par- > Information that is sourced outside
the organization (social media, blogs,
titioning strategies have rarely been relegated to
newswires, etc.).
partition information into the following schemes:
• Information that is directly attributable to • Step 2: Create an information component
inventory, assigning each information compo-
generating or protecting revenue for an
nent to a segment of the business information
enterprise.
model and determining its priority in gener-
• Information that may not be strategically or ating value to the organization. Also, identify
tactically significant to generating revenue but information that is required but not available
is mandatory for business operations. Much as part of this exercise.
financial data (not financing, which is often a
cash position) falls into this category. • Step 3: Assign the information inventory to
the partitions of the business information
• Information that may not fall into the above
model (i.e., directly contributing to enterprise
two categories but is required for regulatory
value, required for operations, etc.).
purposes.
• Information required for archival purposes. • Step 4: Align potential initiatives with the par-
titioned information inventory and determine
• Information that may have once fallen into the
the impact to improving enterprise value by
above categories but has been relegated to
tackling these initiatives, thereby creating a
historical trivia.
roadmap to this prioritized information fabric
The process of partitioning information into areas critical to capturing, extracting or originating
deserving focus (called “focus partitioning4”) is enterprise value.
completed by determining the following:
It is important to note that as much as we think
• Step 1: Taking inventory of information used in that the business stakeholders don’t have the data
the organization. Information will come from they need to perform their job, in reality there is
one of five categories: always a means to obtain and utilize information
> Downloaded and enriched through process- required for determining and executing on the
es managed entirely from desktop systems. strategic, tactical and operational needs of the
Template for Capturing, Aligning Information Components
When capturing the focused information that is used in a big data initiative, it is important to align the data back
to the business information model. The template above is a vehicle that can be used to capture the focused
information exposed through a big data initiative and ensure alignment and proper placement in the business
information model.
Figure 5
cognizant 20-20 insights 7
8. Alignment of Data Inventory with Business Value
Equally important to aligning information to the business information model is the identification of how the
information will result in positive incremental value to the organization. It is important to continually put the
identified data to the test of whether it is actionable and, if properly used, is associated with organizational value.
This template facilitates testing whether information prioritized for the big data initiative is both associated with
the business information model and results in value along the dimensions of the business information model.
Figure 6
enterprise. In areas where the sanctioned tech- initiative may not deliver the value anticipated if
nical vehicles were unable to provide this infor- the little islands of information are engrained into
mation, the enterprise stewards found means to enterprise processes.
cobble together the information they required.
The determination of whether tackling these
It is of paramount importance that the identity and islands of information is included in the enter-
use of this information be ascertained when chart- prise strategy through an enterprise information
ing a course for big data. In reality, lots of related management program, an enterprise data gov-
islands of little data are often sewn together in a ernance program or some other initiative is less
big data initiative. Tackling the obvious big data important than engaging the owners of these
islands of information.
Footnotes
1
Big data includes data sets that grow so large that they become awkward to work with using on-hand
database management tools. Difficulties include capture, storage, search, sharing, analytics and
visualizing.
2
Petabyte Age is a euphemism for the massive volumes of data that many organizations are dealing with
that can be measured in petabytes, a unit of information equal to one quadrillion bytes.
3
Information lifeycle management is a process used to improve the usefulness of data by moving lesser
used data into segments. It is most commonly concerned with moving data from always needed partitions
to rarely needed partitions and, finally, into archives.
4
Focus partitioning is a term created by the author that describes applying generally accepted techniques
to gain performance by segmenting data into partitions (vertical partitioning) to segmenting groups of
data by the likelihood that it will participate in achieving organizational value.
cognizant 20-20 insights 8