Information services companies maintain vast data supply chains (DSC) that form the basis of their revenue streams, yet the onslaight of "big data" threatens to capsize even wel-constructed systems. Optimizing and consolidating DSCs (resulting from M&As, for example) requires a next-generation approach that integrates data sourcing and collecting, data management and data delivery.
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
Big Data's Impact on the Data Supply Chain
1. • Cognizant 20-20 Insights
Big Data’s Impact on the Data Supply Chain
The proliferation of social, mobile and traditional Web computing
poses numerous business opportunities and technological challenges
for information services companies. By revamping their enterprise
level data supply chains and IT infrastructures, as well as embracing
partnerships that deliver analytical acumen, information services
companies with the right organizational mindset and service delivery
model can survive, if not thrive, amid the data onslaught.
Executive Summary patterns, this next-generation platform will also
induce information providers to roll out innovative
The information industry is in the middle of a data
products at a faster pace to beat the competition
revolution. An unimaginably vast amount of data
from both existing and niche information players
is growing exponentially, providing challenges
and meet the demand for value added, real-time
and opportunities for data-centric companies
data and quicker solutions for end consumers.
to unlock new sources of economic value, take
better credit and financial risks, spot business Information Services Trends
trends sooner, provide fresh insights into industry
A deep-dive into today’s information industry
developments and create new products. At the
reveals the following patterns.
same time, data abundance also creates daunting
challenges across numerous dimensions when it Continued Data Deluge
comes to the ways and means of handling and
Approximately five exabytes (EB)1 of data online
analyzing data.
in 2002 rose to 750 EB in 2009 and by 2021 it
This white paper examines the challenges that is projected to cross the 35 zettabytes (ZB)2 level
confront the information services industry and as seen in Figure 1. Statistics3 also indicate that
offers guidance on ways the industry can rethink 90% of the data in the world was created in the
its operating models and embrace new tools and last two years, a sum greater than the amount
techniques to build a next-generation data supply of the data generated in the last 40 years. The
chain (DSC). Our approach is premised on a world’s leading commercial information providers
platform for turbocharging business performance deal with more than 200 million business records,
that converts massive volumes of data spawned refreshing them more than 1.5 million times a
by social, mobile and traditional Web-based day to provide accurate information to a host
computing into meaningful and engaging insights. of businesses and consumers. They source data
To address the information explosion coupled from various organizations in over 250 countries,
with the dynamically changing data consumption 100 languages and cover around 200 currencies.
cognizant 20-20 insights | may 2012
2. Data Volumes Are Growing one 60 second interval, with astounding growth
forecasted.
Ever-changing data consumption patterns and
the associated technology landscape raise data
security and privacy issues as data multiplies
and is shared freely ever more. Convergence
challenges related to unstructured and structured
data also add to the worries. Google, Facebook,
44%
LinkedIn and Twitter are eminent threats to estab-
lished information players as are nontraditional
niche information providers such as Bloomberg
Law, DeepDyve, OregonLaws, OpenRegs, etc. that
provide precise information targeted at specific
2011 2021 customer groups. The big data phenomenon
threatens to break the existing data supply
IDC predicts that between 2009 and 2020 digital
data will grow 44 times to 35ZB, adding .8ZB of chain (DSC) of many information providers, par-
data in 2020 alone. ticularly those whose chains are neither flexible
nor scalable and include too many error-prone,
Figure 1 manual touch points. For instance, latency in
processing all data updates in the existing DSC of
Their databases are updated every four to five one well-known provider currently ranges from 15
seconds. With new data about companies instan- to 20 days, versus a target of 24 hours or less.
taneously created and parameters of existing That directly translates to revenue loss, customer
companies worldwide changing by the minute, dissatisfaction and competitive disadvantage.
the challenge will only intensify.
These risks are real. Google reported a 20%
The world’s leading providers of science and revenue loss with the increased time to display
health information, for example, address the search results by as little as 500 milliseconds.
needs of over 30 million scientists, students and Amazon reported a 1% sales decrease for an
health and information professionals worldwide. additional delay of as little as 100 milliseconds.
They churn out 2,000 journals, 20,000 books and
major reference works each year. Users on Twitter “Free” Information Business Models,
send more than 250 million with Data as the New Fuel
The big data tweets per day. Almost 50 Companies such as Duedil, Cortera and Jigsaw
(recently acquired by Salesforce.com and
phenomenon hoursminute on uploaded
per
of video are
YouTube renamed Data.com) are revolutionizing the “free”
threatens to break by hundreds of millions of business model. The Jigsaw model, for instance,
the existing data users worldwide. Facebook uses crowdsourcing to acquire and deliver a mar-
ketplace for users to exchange business contact
supply chain of many houses with over 200 million
photos
more than 90 billion
information, worldwide. For sharing information
information providers, photos uploaded per day.4 on non-active prospects that these prospects
particularly those would gladly do, users get new leads for free.
While knowledge is hidden in If a user finds incorrect data, he gets points
whose chains are these exabytes of free data, by updating the record in Jigsaw. Providing
neither flexible nor data formats and sources incentives to have users scrub the huge database
scalable and include are proliferating. The value enables Jigsaw to more easily maintain data
of data extends to analytics integrity.
too many error-prone, about the data, metadata
manual touch points. and taxonomy constructs. Essentially, users actively source and update
With new electronic devices, contacts in the Jigsaw database in return for
technology and people free access to the company’s services. Thus, from
churning out massive amounts of content by the a data quality, data entry scalability and data
fractional second, data is exploding not just in maintenance perspective (issues that typically
volume but also in diversity, structure and degree plague systems such as those in the CRM space),
of authority. Figure 2 provides an indicative Jigsaw is a strong tool that can be used to append
estimate of the data bombarding the Internet in incomplete records, augment leads to build highly
cognizant 20-20 insights 2
3. The Digital Data Deluge — One Minute’s Worth of Data Flow
20 47,000 61,141 $83,000
New victims of App downloads Hours of music In sales
identity theft
20 million 3,000
204 million $83,000 Photo views
Photo
Emails sent In sales uploads
320+ 100,000
New Twitter accounts New tweets
13,000 100+
New mobile users New LinkedIn
135
Botnet infections
6 277,000 6 million
New Wikipedia Logins Facebook views
articles published
2+ million
Search queries
30
Hours of
And Future Growth Is Staggering video
uploaded
1.3 million
Video views
Figure 2
targeted lists, plan territories and gain insights How can information services companies
on people and companies with the most complete compete and remain cost leaders? Many of their
B2B data in one place. Jigsaw has already existing DSC systems provide neither enough
amassed more than 30-plus million contacts and insight nor a more robust understanding of their
is growing. It sells this information to customers customers — and nor do they reveal how their end
with large CRM databases who can compare customers are interacting with their data. Their
their database to the Jigsaw database, identi- existing DSC is not built for handling big data and
fying and cleaning up any redundant records. the corresponding big data analytics cannot be
Jigsaw contacts then make money by offering effectively applied and leveraged to shed light
products geared toward companies interested in on what the data means or provide a pathway to
increasing, updating and cleaning their contact reduce IT infrastructure costs to attain greener
directories. These free models intensify competi- operations. Moreover, many information players
tion for traditional data aggregators. and their existing DSC systems are not really
leveraging social media and its related opportuni-
In addition to Google, which operates on an ad- ties to increase customer engagement, improve
supported (free) model, others like WebMD (a content quality and provide incremental value to
health information provider) rely on advertis- the ecosystem.
ing to generate a major portion of their revenue
streams, enabling them to provide free services. We are in an era where we trade our data for free
They then make additional money from subscrip- goods and services. Never before have consumers
tions and premium content, as well as listings from wielded this much power over marketers. All the
individuals who initially come to avail themselves data activity on the Internet, through any device,
of free services and end up paying for a listing creates click-trails, leaves digital breadcrumbs,
in order to heighten awareness for existing and produces data exhaust and creates metadata.
new customers. Such models are allowing newer There is enough economic value in this data for
entrants to underprice the competition or to offer an entire industry to be formed around this itself.
some portions of their information portfolio for We will see a huge influx of companies dealing
free. As such, this approach threatens traditional with the various aspects of data drilling, shipping,
information providers, forcing them to step up. refining, drug discovery and so on. Hence, based
cognizant 20-20 insights 3
4. Thomson Reuters and Information Handling
Services Growth via Acquisitions
Growth of Thomson Reuters and Information Handling Services Through Acquisitions
Thomson Reuters (TR) TR acquires Information TR acquires Solucient, Thomson Corporation and
acquires leading Holdings Inc, TradeWeb, Scholar One, Quantitative Reuters Group PLC combine to
information products, CCBN; sells Thomson Analytics, MercuryMD. IHS form TR. IHS acquires Global
solutions and publishers Media group and Drake acquires Construction Insight, JFA International,
across domains - La Ley, Beam Morin. IHS acquires Research Consulting, Dolphin Software, Environmental
Primark, Carson Group, Cambridge Energy Assets of geoPLUS, Software Providers, Divestco
IOB (Brazil), Online Research Associates, USA CHD-Canadian, Nixon USA, Prime Publications,
business of Dialog. TR acquires Current Information Services. Digital Logs. Documental Solutions, Reservoir
Drugs, Gardiner- Visualization.
Caldwell, launches
Reuters Messaging.
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
TR adopts International
TR sells print-based Financial Reporting
TR continues its acquisition healthcare magazines; TR partners with Merrill IHS acquires Exploration
spree – NewsEdge Corp, Select Standards (IFRS). IHS
exits newspapers; Lynch, Chicago Mercantile Data Services of Gulf acquires Lloyd's Register-
Harcourt General div, FindLaw. acquires Elite “ Exchange; acquires Global Coast, PCNAlert, GDS,
IHS steadily begins information Fairplay completely,
Information Group. Securities Information, Tax RapiData, Geological LogTech Canada, Environ-
provider acquisitions across IHS acquires Partners, Action Images. IHS Consulting Services,
energy, economics, geopolitical mental Support Solutions,
Petroleum Limited, acquires Content and Data Environmax, Jane’s Prómiere.
risk, sustainability and supply GeoFluids Data. Services from i2, American Information Group, John
chain management - Petroleum Technical Publishing, Jebco, S.Herold, SDG Oil & Gas,
Data Services Division, TerraBase, Mapit. McCloskey Group.
Accumap Enerdata.
Source: Cognizant analysis of Thomson Reuters and IHS data published on each company’s Web site.
Figure 3
on the above precedent, large players like Exxon, This model is proving more attractive as data
Mobil, Pfizer or Merck could create large stand- processing scale, distribution and brand power
alone data-slicing organizations. becomes ever more critical.
Mergers and Acquisitions, Industry Acquisitions cause significant problems for
Consolidation and Footprint Expansion companies’ DSCs. There is almost certain to be
Many information providers have expanded into data quality loss from disparate systems and
new markets through M&As and with local part- operational inefficiencies caused by the lack of
nerships. They seek to integrate all acquired a unified view of data. DSC integration issues
companies into a single enterprise-level DSC. cause increased latency, slower time to value and
Having started in legal publishing, Thomson customer access problems. Many existing DSCs
Reuters now has a footprint across various infor- were built as stand-alone systems with closed
mation domains, such as healthcare, tax and architectures, and have undergone many custom-
accounting, intellectual property, financial, media, izations. This makes integration difficult, raising
risk and compliance and even science. A leading costs and slowing payback time. It also increases
financial information provider has recently maintenance and ongoing enhancement costs.
moved its data collection and storage operations Integrating newer functionalities developed using
in Italy to a partner. It has also bought tools for advanced integrated development environments
data interoperability between enterprise-level (IDE), debugging and automation tools makes the
services. development lifecycle an extremely complex task
and transferring taxonomies becomes complicat-
Some players are also consolidating data ed. For these archaic systems, lack of productivity
programs to find synergies in their business line tools and limited hardware and software options
operations. They also want newer data sources to result in greater time to market to meet dynamic
enhance data quality and variety. Figure 3 depicts business requirements or regulatory compliance.
how two large players, Thomson Reuters (TR)
and Information Handling Services (IHS), have As the industry grapples with the information
grown through acquisition during the last decade. explosion, the question on every CIO’s mind is
cognizant 20-20 insights 4
5. how they can handle, manage and analyze this transform their ability to ingest, process and
data avalanche better. From the aforementioned distribute content under a wide variety of new
points, what clearly emerges is a definite need business models. The key objective is to create a
for the information providers to reexamine their next-generation DSC that:
existing DSC for potential solutions. They should By reengineering
• Optimizes operational
leverage their strategic and technology partner
efficiencies. the existing DSC,
capabilities in this discovery and eventual imple-
mentation process. Starting points include: • Reduces data latency. from data sourcing
• Is flexible to accommodate through data delivery,
• Are the providers ready to take advantage of
the above tipping points to emerge as lean and new data sources. providers can
agile players to increase shareholder value? • Is scalable to handle future transform their ability
data volumes.
• How can providers help users find relevant and to ingest, process and
compact information in a flood of big data? • Improves data quality distribute content
while dynamically meeting
To address such issues, this paper explores one consumer demands. under a wide variety
key starting point, what we’ve termed the “next-
generation data supply chain.” It conceptualizes
• Explores newer monetiza- of new business
tion models with data as an
at a high level current and emerging elements asset.
models.
embedded in a DSC that can help enable new
solutions and explore opportunities, partner- • Provides faster time to market and the
potential for greater revenue recognition.
ships and alliances to enhance the value chain.
This paper uses “data” and “information” inter- Figure 4 represents paradigms that could create
changeably as data forms the foundation for any a truly modern DSC. The following subsections
insightful information; increasingly, the two are present salient thoughts around some of the
becoming difficult to distinguish. prime components of such an upgraded DSC that
could address current and futuristic data issues.
The Next-Generation
Data Supply Chain Data Sourcing and Collection
By reengineering the existing DSC, from data This category includes business process outsourc-
sourcing through data delivery, providers can ing (BPO), business process as a service (BPaaS)5
DSC at a Glance
COMMON DATA SUPPLY
CATALYSTS CHAIN (DSC) OPTIMIZATION OPPORTUNITIES
Business Process Crowd-sourced Multi-Platform,
Data Collection
as a Service Data Integrated
Data Sourcing & by Crowd-sourcing
(BPaaS) Digitization Workflow Tools
Platform & Infrastructure (Cloud Solutions, Open Source
Collection
Data Governance and Master Data Management
Data Supply Chain Monitoring Dashboard
Data Supply Chain Maturity Assessment
Data Validation Open Source Statistical Data Qualityas a
& Cleansing Data Quality Tools Automation Service (DQaaS)
Data Quality &
BPaaS Tools
Cleansing
Search Data Mining Extreme
Solutions)
Data Driven
Performance & Advanced On-demand Cloud
Data Simulations
Enhancement Analytics Analytics
Enrichment
Storage Data Data
Big Data & Optimization Warehouse Transformation -
Data No-SQL & Cloud Appliances & Tagging & Quick
Management Computing Utilities Reference Codes
Multi-Channel, Data as a Social Media Data Visualization
Device Friendly Service (DaaS) Integration & Enhanced User
Data Delivery Data Delivery Experience
Figure 4
cognizant 20-20 insights 5
6. and crowdsourcing. Typically, organizations have arbitrage. The cloud combination provides hyper-
treaded the traditional BPO path by focusing only scalability and the ability to deliver solutions
on daily operational tasks to deliver the outcome. affordably. Financial benefits are in tune with
But is there year-round work for data operations reduced operating costs of up to 30%, thereby
to have a dedicated BPO team? Until the need cutting capital expenditures on up-front invest-
for a dedicated BPO team is felt, it is tempting to ments. Although BPaaS is not a panacea, the
have a temporary pay-per- benefits of a variable pricing model combined
With electronic use team but “crowdsourc- with technology and business process excellence
ing,” as mentioned earlier, that reduce or eliminate large capital demands
devices, portables, can be considered as a certainly go beyond cost savings. BPaaS needs to
Internet and rapidly replacement for traditional be embraced as a next-generation delivery model.
evolving digital outsourcing. The crowd Data digitization entails conversion of physical or
refers to the users, who are
applications growing volunteers with common, manual records such as text, images, video and
in importance, it social, financial or even audio to digital form. To address the prolifera-
tion of nonstandard data formats from multiple
becomes imperative intellectual motivation to data sources globally, and to preserve and archive
accomplish a task. They
to adopt data share solutions for mutual data in an orderly manner with simple access to
digitization techniques benefit with the crowd- information and its dissemination, data must
be standardized. Data digitization does that.
and procedures to sourcer (organization or With electronic devices, portables, Internet and
an individual) who is the
create a truly global problem owner. rapidly evolving digital applications growing in
marketplace. importance, it becomes imperative to adopt data
Such distributed problem- digitization techniques and procedures to create
solving addresses scal- a truly global marketplace. Embracing this will not
ability through worldwide access to people and only help providers to contend with the conver-
data almost free of cost and generates innovative gence of structured and unstructured data, but
results by simplifying an otherwise complex task also enable the producer to reach the consumer
that is too difficult to handle internally. There are directly, cutting out inefficient layers. But how
numerous examples of projects using the crowd you do it depends on how your DSC is organized.
to successfully collect and analyze data, some of
which are noted later in this paper. Operating cost For example, how do you digitize 20 years of
assessments for crowdsourced data using US$/ newspaper archives in less than three months?
full-time equivalent (FTE)/hour suggests savings The New York Times did exactly that by using
of about 60% (in the U.S.) to around 65% to 70% the power of collective human minds. The Times
(in India) over traditional outsourcing. But how showed archived words (as scanned images)
dependable is crowdsourcing? The crowd may from the newspaper archives to people who
not work in all situations. Situations where work are filling out online forms across different
is intermittent is one place where crowdsourcing Web sites to spell the words and help digitize
might not work. In this case, BPaaS could be a the text from these archives. The subscribing
more practical middle road approach to integrate Web sites (generally unrelated to the digitiza-
business, tools and information technology and tion process) present these images for humans
to achieve greater opportunities to optimize to decipher and transform into text, as part of
operations and drive efficiency and flexibility. The their normal validation procedures that even
BPaaS model retains a lightweight in-house data optical character recognition (OCR) procedures
reporting team to interact with an outsourced cannot interpret properly. Text is useful because
team who are specialists in handling data scanned newspaper images are difficult to store
production and validation with tools residing on on small devices, expensive to download and
the Cloud. BPaaS pushes standardization across cannot be searched. The images also protect any
many data sources and embodies a flexible pay- suspicious interventions from any automated
per-use pricing model. software programs or “bots,” ensuring that only
humans validate the words. The sites then return
The BPaaS proposition comes with infrastruc- the results to a software service that captures
ture, platform and software “as a service” models and aggregates this digitized textual information.
without affecting the traditional benefits of out- With the system reported to display over 200
sourcing such as process expertise and labor million words, scalability of the crowd provides
cognizant 20-20 insights 6
7. quick results as in each case the human effort is a broad range of generic and specific business
just a few seconds and represents digitization and rules and also adhere to a variety of data quality
validation at its best. standards.
Crowdsourced data also makes a lot of sense, There is a need to incorporate data quality
particularly as business people and consumers components directly into the data integration
increasingly rely on mobile devices and social architecture that address three fundamental data
media platforms to share data more freely across quality activities: profiling, cleansing and auditing.
geographies. An excellent example of this is Profiling helps resolve the data integrity issue
the GPS navigation where the foundation is the and makes unaudited data or extracts acceptable
map database. Rather than rely solely on a map as baseline datasets. Cleansing ensures that
provider database that may not necessarily be outlining and business rules are met. And
up to date or accurate, via crowdsourcing users auditing evaluates how the data meets different
report map errors and new map features. Thus quality standards. To improve customer experi-
users can benefit immensely from each other’s ences, leading to higher loyalty and competitive
reports at no cost. differentiation, providers have to look beyond
custom-built solutions and
Crowdsourcing is a brilliant way to collect massive seek vendor assistance to
data as it brings down the cost of setting up a maximize their results. Data
Failure to implement
data collection unit. However, the data provided quality as a service (DQaaS), comprehensive and
by the user community has to be made credible should be an integral part of automated data
through data verification by the data quality tools. data quality as it allows for a
Although crowdsourcing might affect data quality, centralized approach. With a
cleansing processes
by looking at a large base of users the outliers in single update and entry point that identify data
data could be easily found and eliminated. for all data controlled by data quality issues on an
services, quality of data auto-
Data Quality and Cleansing
matically improves as there
ongoing basis results
High-quality data is a prime differentiator and is a single best version that in organizations
it’s a valuable competitive asset that increases enters the DSC. DQaaS is overspending on data
efficiency, enhances customer service and drives not limited to the way data is
profitability. The cost of poor data quality for a delivered. The solution is also
quality and its related
typical enterprise is estimated to cost 8% to 12% simple and cost conscious as cleansing.
of revenues. British Gas lost around £180M when data access is devoid of any
data quality problems caused its project to fail, need to know the complexities of the underlying
resulting in degraded customer relationships and data. There are also various pricing approaches
contract cancellations. ABN Amro was fined $80M that make it flexible and popular to adopt, be it
for not having effective data quality compliance. quantity or subscriptions based, “pay per call to
Severn Trent Water was fined £26M by regulators API” based or data type based. While there are a
for trying to cover up data quality issues created wide array of tools from AbInitio, Microsoft SSIS,
by its data migration project.6 IBM Ascential, Informatica, Uniserv, Oracle DW
Builder, etc. to choose from, there are also open
Traditionally, companies have been shortsighted
source data quality and data integration tools
when it comes to data quality by not having a
offered such as Google Refine. Given the size of
full lifecycle view. They have implemented source
Google and its open source reach, the company’s
system quality controls that only address the
products should be seriously considered.
point of origin, but that alone is not enough. Data
quality initiatives have been one-off affairs at Open source tools are powerful for working with
an IT level rather than collective efforts of both messy datasets, including cleaning up inconsis-
IT and the business side of the house. Failure to tencies, and transforming them from one format
implement comprehensive and automated data into another. Hence a combination of open source
cleansing processes that identify data quality and proprietary tools will help achieve benefits of
issues on an ongoing basis results in organiza- both worlds.
tions overspending on data quality and its related
cleansing. These issues will only increase in Data Enrichment
number and complexity with the increasing data If we are looking at data mining and doing
sources that must be integrated. A flexible data sentiment analysis of 120,000 Tweet feeds per
quality strategy is potentially required to tackle
cognizant 20-20 insights 7
8. second, the enrichment components will be of analytic applications. They enable quicker
different than, say, handling 2,000 transactions decision-making in the context of rapidly shifting
per second as done by Visa. While data mining business conditions through the introduction of
and search performance optimization has been interactive visualization capabilities. Applications
an integral part of data enrichment, approaches can be scaled across the enterprise with faster,
such as sentiment or predictive analytics and more accurate planning cycles.
data-driven simulations enable more effective
extraction and analysis of data, leading to more With big data tools like Hadoop and extreme
informed decisions. That said, search is still the analytics, enrichment components can crunch
basis for any insightful data extraction. There is data faster and deliver better results. In addition,
a need to evolve search by constantly integrating analytics can be transformed if information
social data, both structured and unstructured, to services with their global services partners can
result in true-to-life recommendations for smarter build a focused team of data scientists who
use. Hence efforts should continue to fine-tune understand cloud computing and big data and
search algorithms with semantic expertise who can analyze and visualize traffic trends. They
focused on users to provide relevant answers (not combine the skills of techie, statistician and a
just results) in fractions of a second. Fine-tuning narrator to extract the diamonds hidden within
the search not only increases targeted traffic and mountains of data. All this will enhance informa-
visibility, but it will also provide high return on tion services providers’ abilities to recommend
investment. The benefits of search optimization the right data sets to the right audience in real
efforts have shown increased conversion rates of time and hence divert more traffic, resulting in
over 30% for a leading American retailer in the higher revenue-generating opportunities.
first two weeks of use, while revenue increased
Data Management
over $300 million for a leading e-commerce
player, to cite two examples. Cloud computing presents a viable solution to
provide the required scalability and to enhance
As a direct impact of the various information business agility. Leading cloud storage providers
trends adding to the data deluge, the term “big such as Amazon Web Services, Google and Micro-
data” has come into prominence. This term is soft’s Azure are steadily cutting prices for their
often used when referring to petabytes, exabytes cloud services. Instead of making millions of
and yet greater quantities of data and generally dollars in upfront investments on infrastructure,
refers to the voluminous amount of structured providers can quickly convert Cap-Ex to Op-Ex
and unstructured data that takes too much and pay for data services as they go. Furthermore,
time, effort and money to handle. Since it has to the cost for data storage is declining significantly.
do with extremely large data sets that will only Providers could simply tweak their data classifica-
increase in the future, big data and its analytics tion schema to optimize storage for even bigger
require a different treatment. Hence, there is also savings. For example, by moving from a three-tier
a need to augment platforms such as Hadoop7 classification to a six-tier one,8 one large manu-
to store Web-scale data and support complex facturer cut its storage cost by almost $9 million
Web analytics. Since Hadoop uses frameworks for 180 TB of data.
to distribute the large data sets, processing
loads amongst hundreds or even thousands of Going by analysts’ predictions on the impact of
computers, it should be explored as an enterprise cloud on the business of storage, there’s $50
platform for extreme enterprise analytics — that billion of gross margin of enterprise storage in
is, extremely complex analytics on extremely play with the transition to cloud providers. Data
large data volumes. Since one of the prime center optimization cum consolidation cannot
objectives of any supply chain is an excellent user be neglected either. A Fortune 50 company
experience, combining critical information and optimized its 100,000-square-foot data center9 in
insight capabilities using advanced analytics is Silicon Valley with the help of a solutions provider,
the way forward. Furthermore, to extract the best resulting in $766,000 per year in annual energy
performance through hardware and software savings, a $530,000 Silicon Valley power rebate
integration, tools such as Oracle Exalytics can and a three-month return on investment. Virtu-
be adopted to accelerate the speed at which alization in all its forms, including server virtu-
analytics algorithms run. Such tools provide alization and storage virtualization, gets more
real-time visual analysis, and enable new types computing power out of servers while delivering
cognizant 20-20 insights 8
9. a more adaptable data center that will enable big ways to eliminate data redundancy and enhance
data to be crunched efficiently while requiring less search efficiencies; with just-in-time data updates
electric power (and thus is greener). Providers in the production environment, overall latency
should hence consider cloud and storage optimi- is reduced. Quick response (QR) code is another
zation as part of their DSC transformation. optimizing measure for digitally packing more
data than the erstwhile bar
Big Data Fundamentally Changes codes (enabling faster read- LexisNexis High-
Data Management ability, especially through
Since information providers have to inevitably mobile devices). Modern search Performance
face the big data revolution, they need to be engines recognize these codes Computing Cluster
prepared by building the big data technology to determine the freshness of (HPCC) and
stacks. Apart from Hadoop, as discussed above, Web site content. To stay on
some of the necessary components in this archi- top of search listings, providers Appistry’s cloud
tecture stack include Cassandra, a hybrid non- need to switch to QR codes to IQ storage Hadoop
relational database that provides flexible schema, increase revenue conversion editions are examples
true scalability and multi-datacenter awareness; rates as they provide quick and
and No-SQL databases. No-SQL databases offer convenient user access to their of data supply chains
a next-generation environment that is non- Web sites and hence capture built on clustered file
relational, distributed, open-source and hori- more of the user’s attention, systems storage.
zontally scalable. These capabilities need to be particularly when used in
baked into the enhanced data supply value chain advertisements. All these have
through a service-oriented architecture (SOA) opened the door for more innovative and reve-
that will enhance unstructured and structured nue-generating improvements and savings.
data management and make providers future-
ready by integrating widely disparate applications Data Delivery
for a Web-based environment and using multiple How data is accessed, explored and visualized has
implementation platforms. Data warehouse been disrupted by the continued expansion and
appliances and utilities will be a blessing in availability of mobile devices. Any information
disguise that extends beyond a traditional data services company that has not moved away from
warehouse, providing robust business intelli- largely static and visually uninteresting data rep-
gence. They are easy, affordable, powerful and resentations, device incompatible applications or
optimized for intensive analytical analysis and Web applications that don’t support the next-age
performance with speedy time-to-value. Open social media platforms should do so with urgency.
source clustered file systems like the Hadoop Dis- They don’t have to start by custom building from
tributed File Systems (HDFS) and its alternatives scratch. They can adopt data
as well as the Google file systems also resolves visualization application suites Classifying databases
some of the leading big data challenges. They are available either commer-
better in performance and provide cost-efficient cially or outsourced to their
for topicality is one of
scalability. They are hardware and operating technology partners that can the ways to eliminate
system agnostic, making them flexible and easy help build automated graphic data redundancy
to design and manage and provide industry- data representation tools or
leading security in cloud deployments. LexisNexis revealing infographic aggre-
and enhance search
High-Performance Computing Cluster (HPCC) and gations quickly, or provide efficiencies; with just-
Appistry’s cloud IQ storage Hadoop editions are their proprietary packages in-time data updates
examples of data supply chains built on clustered at a lower cost. This will not
file systems storage. only enhance real-time data
in the production
but also seamlessly integrate environment, overall
Automation is an integral part of this transforma- with other devices, tools and latency is reduced.
tion and some of the aforementioned tools use a software.
combination of data and artificial intelligence to
cut repetitive tasks of managing various aspects Device applications development holds a mul-
of the value chain. This would definitely free up tiplicity of business opportunities to increase
personnel to focus on core strategic issues rather audience reach and customer engagement. To
than on tasks that can be easily automated. get there, information services companies should
Classifying databases for topicality is one of the consider partnering with their service providers
cognizant 20-20 insights 9
10. to change the way data is viewed. This should dimensions: data latency, data quality, scalability,
be combined with a user-friendly interface to flexibility and cost of operations. A single orga-
an external rules engine and repository that nization-wide dashboard with logging capabilities
business users can use to modify or validate and a user-friendly interface that reports these
business rules and hence add value by further metrics in real time will enable effective data
authenticating data before delivery. With the auditing and eventually strong data governance.
business world speeding toward mobility, the way
data is presented and accessed must be altered Finally, the DSC is only as good as the data
to accommodate consumers on the move. Finally, which goes into it. Hence, having a systematic
there is tremendous opportunity to strengthen data governance practice with checks on data
information providers’ client relationships and consistency, correctness and completeness will
grow their businesses, leveraging a strong social help maintain the DSC without too much effort
media-influenced DSC strategy. Providers will add and ensure adequate data privacy. Companies
credibility and promote attention among key user normally have data policies, but these policies are
constituencies by enabling their data delivery merely used to satisfy compliance obligations.
through social aggregators since this approach Current data governance programs have not yet
enhances data and collaboration, creates a buzz attained maturity but companies that have data
by distributing and publicizing new informa- governance typically show a 40% improvement in
tion releases, builds virtual interactive commu- ROI10 for IT investments compared to companies
nities and enables user-generated information that don’t have it.
to enhance core research curation. Revenue
Challenges to achieving an effective next-genera-
generating and increasing avenues will be the
tion DSC include:
direct outcome of such a social media integration
measure. • The very thought of an end-to-end revamp
of the entire DSC is overwhelming and the
Assessment, Monitoring and amount of time, money and effort involved is
Data Governance daunting to leaders.
One of the logical starting points is to perform an • Without CXO-level sponsorship and motivation,
overall assessment of the existing DSC. An alter- there is little chance of success.
native approach would be that instead of con-
sidering the entire value chain in one stroke, to • Given the significant change management
involved, and without a critical mass of
examine each step over time. This could take the
catalysts from the data provider and consumer
form of an assessment of subcomponents of the
communities, frustration can ensue.
DSC (as per specific pain points on a client-to-cli-
ent basis) through a consulting exercise. Through • A DSC can only be successful if the data is stan-
such assessments, the consulting team can: dardized, as otherwise the organization must
write custom code to standardize, clean and
• Expose the “as-is state” of the current DSC and integrate the data.
its capabilities; this would also include issues
and potential risks. • Cloud computing and many of the “as-a-ser-
vice” models rely on the service provider’s
• Arrive at a “to-be state,” unlocking potential ability to avoid service downtime.
opportunities, through findings and recom-
mendations. The Path Ahead
• Establish a business case and ROI to Staying ahead of the curve and emerging
quantify benefits, including justification and victorious will require information services
resourcing to realize the desired value-chain companies across industries and domains to
transformation. embrace a next-generation DSC, selecting key
• Road-map the data value chain implementation elements and service delivery models to unlock
process and present a high-level architecture in productivity, seize competitive advantage and
a prioritized and time-phased fashion. optimize the business for dynamic changes in the
market. As we delve deeper to understand the
Once the DSC is baselined, the next step is to architectural, infrastructural, technical and even
instrument the supply chain and vigorously business model implications, there is further
monitor performance across the following prime scope for innovation.
cognizant 20-20 insights 10
11. With pressure to cut costs further while at the strategic decisions by carefully trading off risks
same time modernize, there will be an evolving and rewards that ensure coexistence with today’s
need for additional models, methods, frameworks, business constraints and tomorrow’s demands for
infrastructure and techniques that allow providers real-time, anywhere, anytime information access.
to tackle a data avalanche that will only increase. Information providers that postpone the decision
Information providers should collaborate with to face the inevitable trends in the informa-
their current or potential strategic and imple- tion industry as discussed in this paper will find
mentation partners to mutually explore areas themselves stuck with rigid legacy environments
in domain, products, services and technology to and will be eventually overtaken by forward-look-
“wow” the end consumer. ing and more innovative competitors. If the vision
is to be the market leader in world-class informa-
Whether organizations expand existing archi- tion and to be the consumer’s preferred choice
tectures or start afresh by building, buying or for insightful information and decision-making, it
acquiring new capabilities for a more modern and is high time for information players to act now.
future-proof DSC, they will need to quickly make
Footnotes
1
1 Exabyte (EB) = 1018 bytes and is equal to 260 in binary usage.
2
1 Zettabyte (ZB) = 1021 bytes and is equal to 270 in binary usage.
3.
“Where angels will tread,” The Economist, http://www.economist.com/node/21537967
4.
Data points obtained directly from the following company Web sites: Elsevier, Twitter, YouTube and
Facebook.
5.
Business process as a service, or BPaaS, is an application delivered as a service that is used by service-
provider teams that perform business tasks on behalf of the service recipient. BPaaS combines traditional
business process outsourcing (BPO) and software as a service (SaaS) to optimize business processes and
elevate outcomes.
6.
British Gas, ABN Amro and Severn Trent Water examples: “Business Value for Data Quality,”
http://www.x88.com/whitepapers/x88_pandora_data_quality_management.pdf
7.
Hadoop is a free (open source) Java-based programming framework that supports the processing of large
data sets in a distributed computing environment. It is primarily conceived on a MapReduce paradigm that
breaks applications into smaller chunks to be processed in a distributed fashion for rapid data processing.
8.
“How to save millions through storage optimization,” http://www.networkworld.com/supp/2007/
ndc3/052107-storage-optimization-side.html
9.
“SynapSense Bolsters Data Center Infrastructure Management Solutions with New Energy Management
Tools,” http://www.reuters.com/article/2011/06/29/idUS251193+29-Jun-2011+BW20110629
10.
“IT Governance: How Top Performers Manage IT Decision Rights for Superior Results,”
Peter Weill and Jeanne Ross, Harvard Business School Press.
About the Author
Sethuraman M.S. is the Domain Lead for Information Services within Cognizant’s Information, Media
and Entertainment (IME) Consulting Practice. Apart from providing leadership to a team that consults
to information services companies, he provides thought leadership built on his expertise in the infor-
mation markets and global media. His extensive IT experience spans consulting, program management,
software implementations and business development. Sethu has worked with clients worldwide and has
won numerous awards and recognitions. He received a BTech degree in instrumentation and electronics
from the National Institute of Technology (NIT), Punjab, India, an MS degree in software systems from
Birla Institute of Technology and Science (BITS), Pilani, India, and an MBA from the Asian Institute of
Management (AIM), Manila, Philippines. Sethu can be reached at Sethuraman.MS@cognizant.com.
cognizant 20-20 insights 11