SlideShare a Scribd company logo
1 of 7
• Cognizant 20-20 Insights




Making Sense of Big Data in the
Petabyte Age
   Executive Summary                                      continues to accelerate in terms of volume,
                                                          complexity and formats.2
   The concept of “big data”1 is gaining attention
   across industries and the globe, thanks to the         A traditional approach to handling big data is to
   growth in social media (Twitter, Facebook, blogs,      replace SQL with tools like MapReduce.3 However,
   etc.) and the explosion of rich content from other     the sheer volume of data contained in a data
   information sources (activity logs from the Web,       set that is routinely analyzed does not solve the
   proximity and wireless sources, etc.). The desire to   more pressing issue — that people have difficulty
   create actionable insights from the ever-increas-      focusing on massive amounts of tables, files, Web
   ing volumes of unstructured and structured             sites and data marts, all of which are candidates
   data sets is forcing enterprises to rethink their      for analysis. It’s not all about just data warehouses,
   approaches to big data, particularly as tradi-         anymore.
   tional approaches have proved difficult — if even
   possible — to apply to structured data sets.           Usability is a factor that will overshadow the
                                                          technical characteristics of big data analysis for
   While data volume proliferates, the knowledge it       at least the next five years. This paper focuses
   creates has not kept pace. For example, the sheer      specifically on the roadmap organizations must
   complexity of how to store and index large data        create and follow to survive the Petabyte Age.
   stores, as well as the information models required
   to access them, have made it difficult for organi-     Big Data = Big Challenges
   zations to convert captured data into insight.
                                                          The Petabyte Age is creating a multitude of
   The media appears obsessed with how today’s            challenges for organizations. The accelerating
   leading companies are dealing with big data, a         deluge of data is problematic to all, for within
   phenomenon known as living in the “Petabyte            the massive array of data sources — including
   Age.” However, coverage often focuses on the           data warehouses, data marts, portals, Web sites,
   technology aspects of big data, leaving such           social media, files and more — is the informa-
   concerns as usability largely untouched.               tion required to make the smartest strategic
                                                          business decisions. Many enterprises are facing
   For years, the accelerating data deluge has also       the dilemma that the systems and processes
   received significant attention from industry           devised specifically to integrate all this informa-
   pundits and researchers. What is new is the            tion lack the required responsiveness to place the
   threshold that has been crossed as the onslaught       information into a neatly organized warehouse in




   cognizant 20-20 insights | june 2011
time to be used at the current speed of business.                             usage and availability and other issues —
The heightened use of Excel and other desktop                                 require businesses to store increasingly greater
tools to integrate needed information in the most                             volumes of information for much longer time
expedient way possible only adds complexity to                                frames.
the problem enterprises face.
                                                                          As a result of these factors, enterprises worldwide
There are a number of factors at the heart of big                         have been rapidly increasing the amount of data
data that make analysis difficult. For example,                           housed for analysis to compete in the Petabyte
the timing of data, the complexity of data,                               Age. Many have responded by embracing new
the complexity of the synthesized enterprise                              technologies to help manage the sheer volume
warehouse and the identification of the most                              of data. However, these new toys also introduce
appropriate data are equal if not larger challenges                       data usability issues that will not be solved by
than dealing with large data sets, themselves.                            new technology but rather will require some
                                                                          rethinking of the business consequences of big
The increased complexity of the data available for                        data. Among these challenges:
generating insights is a direct consequence of the
following:                                                                •   Big data is not only tabular; it also includes
                                                                              documents, e-mails, pictures, videos, sound
•    The highly communicative and integrated                                  bites, social media extracts, logs and other
     global economy. Enterprises across industry                              forms of information that is difficult to fit
     are increasingly seeking more granular insight                           into the nicely organized world of traditional
     into market forces that ultimately shape their                           database tables (rows and columns).
     success and failure. Data generated by the
     “always-on economy” is the impetus, in many                          •   Companies that tackle big data as a tech-
                                                                              nology-only initiative will only solve a single
     cases, for the keen interest in implementing
                                                                              dimension of the big data mandate.
     so-called insight facilitation appliances on
     mobile devices – smartphones, iPads, Android                         •   There are sheer volumetric issues, such
     and other tablets — throughout the enterprise.                           as billions of rows of data, that need to be
                                                                              solved. While tried-and-true technologies (par-
•    The enlightened consumer. Given the
                                                                              titioning) and newer technologies (MapReduce,
     explosion in social media and smart devices,
                                                                              etc.) permit organizations to segment data into
     many consumers have more information at
                                                                              more manageable chunks, such an approach
     their fingertips than ever before (often more
                                                                              does not deal with the issue that rarely
     so than businesses) and are becoming increas-
                                                                              used information is clogging the pathway to
     ingly sophisticated in how they gather and
                                                                              necessary information. Traditional lifecycle
     apply such information.
                                                                              management technologies will alleviate many
•    The global regulatory climate. New regulatory                            of the volumetric issues, but they will do little
     mandates — covering financial transactions,                              to solve the non-technical issues associated
     corporate health, food-borne illnesses, energy                           with volumetrics.


Tabulating the Information Lifecycle

         Types of Information Retained the Longest                                  How Much Information is Retained
    Development Records         4%                                              >1 PB     4%
                  Email         4%                                            >500 TB          7%
       Financial Records         5%                                           >100 TB          7%
       Database Archive              6%                                        >25 TB     4%
    Government Records                          11%                            >10 TB                    14%
    Organization Records                                    18%                 >5 TB                          18%
       Customer Records                                      19%                >1 TB                                        25%
            Source Files                                            25%         <1 TB                                21%

                           0%   5%        10%         15%    20%   25%              0%    5%    10%     15%    20%         25%


Source: The 100 Year Archive Task Force, SNIA Data Management Forum
Figure 1



                                cognizant 20-20 insights                  2
As a result of mergers and acquisitions, global                 >   Information required for regulatory activi-
sourcing, data types and other issues, the sheer                    ties but not necessarily related to the cre-
number of tables and objects available for access                   ation, extraction or capture of value.
has mushroomed. This increase in the volume of
objects has made access schemes for big data
                                                                >   Historical supporting information.

overly complex, and it has made finding needed                  >   Historical information that was once aligned
data akin to finding a needle in a haystack.                        with value, regulatory or other purposes
                                                                    but is now kept because it might be useful
•   The information lifecycle management4 con-                      at some future date.
    siderations of data have not received the
    attention they deserve. Information lifecycle
                                                         •      Much of the complexity of information
                                                                made available for deriving insight is a
    management should not be limited to parti-                  complex weave of multiple versions of the
    tioning schemes and the physical location of                truth, data organized for different purposes,
    data. (Much attention is being given to cloud               data of different vintages and similar data
    and virtualized storage, which presumes a                   obtained from different sources. This is a
    process has been devised for rationalizing the              phenomenon that many organizational stake-
    fact that always-on information made available              holders describe as a “black box of informa-
    in the cloud is worthy of this heightened avail-            tion” into which they have no insight into its
    ability.) Information lifecycle management                  lineage. This adds delay to the use of insight
    is the process that traditionally stratifies the            gained from such information, a development
    physical layout for technical performance. In               caused by the obvious necessity of validating
    the Petabyte Age, where the amount of infor-                information prior to using it for anything out
    mation available for analysis is increasing at              of the ordinary. Much of the data available for
    an accelerating rate, the information lifecycle             analysis results from the conventional wisdom
    management process should also ensure a                     that winning organizations are “data pack
    heightened focus on information that matters.               rats” and that information that arrives on their
    This stratification should categorize informa-              doorsteps tends to stay as a permanent artifact
    tion into the following groups:                             of the business. Interestingly, according to the
    >   Information directly related to the creation,           100-Year Archive Task Force,5 source files are
        extraction or capture of value.                         the most frequently identified source of data
                                                                retained as part of the “100-Year Archive.”
    >   Supporting information that could be re-
        ferred to when devising a strategy to cre-       •      A sizable amount of operational information
        ate, extract or capture value.                          is not housed in official systems adminis-
                                                                tered by enterprise IT groups but instead is
    >   Information required for the operations of
                                                                stored on desktops throughout the organi-
        the enterprise but not necessarily related
                                                                zation. Much of these local data stores were
        to the creation, extraction or capture of
                                                                created with good intentions; people respon-
        value.


Many Sources of Data


                                                    Partner             Data
                                     Portal          Data             Warehouse
                                   Taxonomy                           Dimensions


                                       Other                            Informal


                                                        ?
                                    Information                         Network

                              Spreadsheets
                              and Local Data                             Operational
                                 Personal                                  System
                               Organization                                 Keys


Figure 2



                         cognizant 20-20 insights           3
sible for business functions had no other way          The Roadmap to Managing Big Data
    of gaining control over the information they
                                                           Big data will be solved through a combination
    needed. As a result, these desktop-based
                                                           of enhancements to the people, process and
    systems have created a different form of big
                                                           technology strengths of an enterprise.
    data — a weave of Microsoft Access, Excel and
    other desktop tool-based sources that are just         People-based initiatives that will impact big data
    as critical in running enterprises. The contents       programs employed at companies include the
    of these individually small to medium-size             following:
    sources of information collectively add up
    to a sizable number of sources. One large              •   Managing the information lifecycle employed
                                                               at the organizations. For good reason (glaring
    enterprise was found to have close to 1,000
                                                               privacy and security concerns, among them),
    operational metrics managed in desktop appli-
                                                               organizations have placed significant focus
    cations (i.e., Excel) — which is not an uncommon
                                                               on information governance. The mandate for
    situation. These sources never make it to the
                                                               determining which data deserves focus should
    production environment and downstream data
                                                               be part of the overall governance charter.
    warehouses.

•   Much of the big data housed in organiza-               •   Ensuring a sufficient skill set to tackle the
                                                               issues introduced as a consequence of big
    tions results from regulatory requirements
                                                               data.
    that necessitate storing large amounts of
    historical data. This large volume of historical       •   Developing a series of metrics to manage
    data hides the data required for insight. While            the effectiveness of the big data program.
    it is important to retain this information, the            This includes:
    necessity to house it with the same priority as
    information used for deriving insight is both
                                                               >   Average, minimum and maximum time re-
                                                                   quired to turn data into insight.
    expensive and unnecessary.
                                                               >   Average, minimum and maximum time re-
The Case for Horizontal Partitioning                               quired to integrate new information sources.
Horizontal partitioning is the process defined as              >   Average, minimum and maximum time re-
segmenting data in such a way that prioritizes                     quired to integrate existing information
information required for value extraction, origi-                  sources.
nation and capture.6 This partitioning process
should result in a process that tiers information
                                                               >   Time required for the management process.

along the traditional dimensions of a business                 >   Percentage of people associated with the
                                                                   program participating in the management
information model. Such a model enhances the
                                                                   process.
focus of information that supports the extraction,
origination and capture of value.                              >   Value achieved from the program.


Converting Big Data Into Value

                                                Relevant Actionable
                       Acquired &                   Trustworthy                  Learned
                   Created Knowledge                  Data                      Inference

                  Capabilities                                            Customers    Markets
                               Channels Value
                    Risks
                        Investors
                                        Chain       Insight              Regulatory Expected
                                                                         Disruptions Outcomes

                         Heard
                       Inference
                                                     Action                     Innovation


                     Extracted                                                  Originated
                       Value                         Value                        Value


                      Captured                                                    Captured
                     Transaction                  Captured Value                Value Stream


Figure 3



                        cognizant 20-20 insights           4
According to McKinsey,7 the activities of people         Big Data Getting Bigger
steering big data will include:

•   Ensuring that big data is more accessible and             TB            eBay: 6.5 PB, 2009
    timely.                                                  1,000
                                                                            Google: 1 PB of new data every 3 days, 2009


•   Measuring the value achieved by unlocking big             800
    data.                                                                      Size of the Largest Data
                                                                                  Warehouse in the
                                                              600
•   Embracing experimentation through data to                                   Winter Top Ten Survey
                                                                                    CAGR = 173%
    expose variability and raise performance.                 400                                                            Moore’s Law
                                                                                                                             Growth Rate

•   Enabling the customization of populations                 200

    through segmentation.                                       0
                                                                     1998     2000    2002    2004    2006    2008    2010    2012
•   Facilitating the use of automated algorithms
                                                                     � Actual        � Projected
    to replace and support human decision-
    making, thereby improving decisions, minimiz-
                                                         Source: Winter Corp.
    ing risks and unearthing valuable insights that
                                                         Figure 4
    would otherwise remain hidden.

•   Facilitating innovation programs that use new        Technology-based initiatives that will impact big
    business models, products and services.              data programs employed at companies include:
Process-based initiatives that will impact big data
programs are best enabled as augmentations of a
                                                         •     Ensuring that the specialized skills required
                                                               to administer and use the fruits of the big
company’s governance activities. These augmen-                 data initiative are present. Some of these
tations include:                                               include the databases, the ontologies used
                                                               to navigate big data and the MapReduce
•   Ensuring sufficient focus on information
                                                               concepts that augment or fully replace SQL
    that will drive value within the organization.
                                                               access techniques.
    These processes are best employed as linkages
    between corporate strategy and information           •     Ensuring that tools introduced to navigate
    lifecycle management programs.                             big data are usable by the intended audience
                                                               without upsetting self-service paradigms that
    >   It is important to note that information
                                                               have slowly gained traction during the past
        lifecycle management is defined in many
                                                               several years.
        organizations as a program to manage hi-
        erarchies and business continuity. For the       •     Ensuring that the architecture and sup-
        purposes of this paper, this definition is ex-         porting network, technology and software
        tended to include promotion and demotion               infrastructures are capable of supporting big
        of data items as aligned with the business             data.
        information model (how information is used
                                                         It is safe to state that if history is any prediction
        to support enterprise strategies and tactics)
                                                         of the future, the sheer volume of data that orga-
        of the organization.
                                                         nizations will need to deal with will outstrip our
    >   The process used to govern big data and its      collective imaginations for how much data will
        information lifecycle should continually re-     be available for generating insights.. Only eight
        fine and prioritize the benefits, constraints,   years ago, a 300 to 400 terabyte data warehouse
        priorities and risks associated with the         was considered an outlier. Today, multi-petabyte
        timely publication and use of relevant, fo-      warehouses are easily found. Failure to take action
        cused, actionable and trustworthy informa-       to manage the usability of information pouring
        tion published under big data initiatives.       into the enterprise is (and will be) a competitive
                                                         disadvantage (see Figure 4).
•   Ensuring the metrics that drive proper
    adhesion and use of big data are developed.
    This should cover the following topics:
                                                         Recommendations
                                                         Big data is a reality in most enterprises. However,
    >   Governing big data.
                                                         companies that tackle big data as merely a
    >   Big data lifecycle.                              technology imperative will solve a less important
    >   Big data use and adoption.                       dimension of their big data challenges. Big data
                                                         is much more than an extension of the technolo-
    >   Big data publication metrics.


                         cognizant 20-20 insights        5
Storage Definitions                                      to the models (making it difficult for knowledge
                                                         workers to navigate the data needed for analysis)
                                                         and added an analytic plaque, which makes finding
                                                         the required information for analysis more akin to
    Terabyte     Will fit 200,000 photos                 finding a needle in a haystack.
                 or MP3 songs on a single
                 1 terabyte hard drive.
                                                         In many organizations, information lifecycle ini-
    Petabyte     Will fit on 16 Backblaze                tiatives are mandated that too often focus on
                 storage pods racked in
                 two data center cabinets.               the optimal partitioning and archiving of the
                                                         enterprise (i.e., vertical partitioning). Largely a
    Exabyte      Will fit in 2,000 cabinets and          technology focus, this thread of the information
                 fill a four story data center
                 that takes up a city block.             lifecycle overlooks data that is no longer aligned
                                                         with the strategies, tactics and intentions of the
    Zettabyte    Will fit in 1,000 data centers          organization. The scope and breadth of informa-
                 or about 20% of Manhattan,
                 New York.                               tion housed by enterprises in the Petabyte Age
                                                         mandates that data be stratified according to its
    Yottabyte    Will fit in the states of Delaware      usefulness in the organizational value creation
                 and Rhode Island with a
                 million data centers.                   (i.e., horizontal partitioning). In today’s organiza-
                                                         tions, the only cross-functional bodies with the
                                                         ability to perform this horizontal partitioning are
                                                         virtual organizations employed to govern the
                                                         enterprise information asset.
Figure 5
                                                         It was only a few years ago that a data warehouse
gies used in partitioning strategies employed at         requiring a terabyte of storage was the exception.
enterprises.                                             As we embrace the Petabyte Age, companies
                                                         are entering an era where they will need to be
Companies have proved that they are pack rats.           capable of handling and analyzing much larger
They need to house large amounts of history for          populations of information. Regardless of the
advanced analytics, and regulatory pressures             processes put in place, ever-increasing volumes
influence them to just store everything. The             of structured and unstructured data will only
reduced cost of storage has allowed companies            proliferate, challenging companies to quickly and
to turn their data warehousing environments into         effectively convert raw data into insight in ways
data dumps, which has added both a complexity            that stand the test of time.


Footnotes
1
    Big data refers to data sets that grow so large that they become awkward to work with using on-hand
    database management tools. Difficulties include capture, storage, search, sharing, analytics and visu-
    alizing.
2
    “The Toxic Terabyte” (IBM Research Labs, July 2006) provides a thorough analysis of how companies
    had to get their houses in order to deal with a terabyte of data. If this authorative work were rewritten
    today, it would be called “The Problematic Petabyte” and in five years, most probably, “The Exhaustive
    Exabyte.”
3
    MapReduce is a Google-inspired framework specifically devised for processing large amounts of data
    optimized across a grid of computing power.
4
     Information lifecycle management is a process used to improve the usefulness of data by moving
    lesser used data into segments. Information lifecycle management is most commonly interested in
    moving data from always-needed partitions to rarely needed partitions and, finally, into archives.
5
    SNIA Data Management Forum’s 100 Year Archive Task Force, 2007.
6
     Horizontal partitioning is a term created by the author. It describes the application of generally
    accepted techniques of gaining performance by segmenting data into partitions (vertical partitioning)
    to segmenting groups of data by the likelihood of it achieving organizational value.
7
    “Big Data, The Next Frontier for Innovation, Competition and Productivity,” McKinsey & Company, May
    2011.


                             cognizant 20-20 insights     6
About the Author
Mark Albala is Director of Cognizant’s North American Business Intelligence and Data Warehousing
Consulting Practice. This practice provides solution architecture, information governance, informa-
tion strategy and program governance services to companies across industries and supports Cogni-
zant’s business intelligence and data warehouse delivery capabilities. A graduate of Syracuse University,
Mark has held senior thought leadership, advanced technical and trusted advisory roles for organiza-
tions focused on the disciplines of information management for over 20 years. He can be reached at
Mark.Albala@cognizant.com.




About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-
sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in
Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry
and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50
delivery centers worldwide and approximately 111,000 employees as of March 31, 2011, Cognizant is a member of the
NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and
fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.



                                         World Headquarters                  European Headquarters                 India Operations Headquarters
                                         500 Frank W. Burr Blvd.             Haymarket House                       #5/535, Old Mahabalipuram Road
                                         Teaneck, NJ 07666 USA               28-29 Haymarket                       Okkiyam Pettai, Thoraipakkam
                                         Phone: +1 201 801 0233              London SW1Y 4SP UK                    Chennai, 600 096 India
                                         Fax: +1 201 801 0243                Phone: +44 (0) 20 7321 4888           Phone: +91 (0) 44 4209 6000
                                         Toll Free: +1 888 937 3277          Fax: +44 (0) 20 7321 4890             Fax: +91 (0) 44 4209 6060
                                         Email: inquiry@cognizant.com        Email: infouk@cognizant.com           Email: inquiryindia@cognizant.com


© Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

More Related Content

What's hot

Infor cloud suite industrial machinery, handbook, english 0216
Infor cloud suite industrial machinery, handbook, english 0216Infor cloud suite industrial machinery, handbook, english 0216
Infor cloud suite industrial machinery, handbook, english 0216
Drake Brown
 
2013 kepler cheuvreux autumn conference 20130920
2013 kepler cheuvreux autumn conference 201309202013 kepler cheuvreux autumn conference 20130920
2013 kepler cheuvreux autumn conference 20130920
kailashgavare
 

What's hot (19)

Strengthen Operational Efficiencies with IT Infrastructure Managed Services b...
Strengthen Operational Efficiencies with IT Infrastructure Managed Services b...Strengthen Operational Efficiencies with IT Infrastructure Managed Services b...
Strengthen Operational Efficiencies with IT Infrastructure Managed Services b...
 
HCL Infosystems - Services Brochure
HCL Infosystems - Services BrochureHCL Infosystems - Services Brochure
HCL Infosystems - Services Brochure
 
Gamifying Cognizant 3.0
Gamifying Cognizant 3.0Gamifying Cognizant 3.0
Gamifying Cognizant 3.0
 
Selecting a Software Solution: 13 Best Practices for Media and Entertainment ...
Selecting a Software Solution: 13 Best Practices for Media and Entertainment ...Selecting a Software Solution: 13 Best Practices for Media and Entertainment ...
Selecting a Software Solution: 13 Best Practices for Media and Entertainment ...
 
Flight Plan Design + Blockchain (Fashion / Retail)
Flight Plan Design + Blockchain (Fashion / Retail)Flight Plan Design + Blockchain (Fashion / Retail)
Flight Plan Design + Blockchain (Fashion / Retail)
 
RPA helps leading US mortgage provider stay ahead by improving agent producti...
RPA helps leading US mortgage provider stay ahead by improving agent producti...RPA helps leading US mortgage provider stay ahead by improving agent producti...
RPA helps leading US mortgage provider stay ahead by improving agent producti...
 
Cognizant -- New Business Models through Collaborations
Cognizant -- New Business Models through CollaborationsCognizant -- New Business Models through Collaborations
Cognizant -- New Business Models through Collaborations
 
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
Using Adaptive Scrum to Tame Process Reverse Engineering in Data Analytics Pr...
 
The Digital Manufacturer
The Digital ManufacturerThe Digital Manufacturer
The Digital Manufacturer
 
Processes in the Networked Economies: Portal, Vortex, and Dynamic Trading Pro...
Processes in the Networked Economies: Portal, Vortex, and Dynamic Trading Pro...Processes in the Networked Economies: Portal, Vortex, and Dynamic Trading Pro...
Processes in the Networked Economies: Portal, Vortex, and Dynamic Trading Pro...
 
Processes Driving the Networked Economy: Process Portals, Process Vortex and ...
Processes Driving the Networked Economy: Process Portals, Process Vortex and ...Processes Driving the Networked Economy: Process Portals, Process Vortex and ...
Processes Driving the Networked Economy: Process Portals, Process Vortex and ...
 
Splice Machine Digital Transformation 2.0 white paper
Splice Machine Digital Transformation 2.0 white paperSplice Machine Digital Transformation 2.0 white paper
Splice Machine Digital Transformation 2.0 white paper
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
 
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
How Enterprise Architects Can Build Resilient, Reliable Software-Based Health...
 
The Top Three Product Lifecycle Management Trends Taking Shape Across the Dig...
The Top Three Product Lifecycle Management Trends Taking Shape Across the Dig...The Top Three Product Lifecycle Management Trends Taking Shape Across the Dig...
The Top Three Product Lifecycle Management Trends Taking Shape Across the Dig...
 
Sap the digital building products company
Sap   the digital building products companySap   the digital building products company
Sap the digital building products company
 
Infor cloud suite industrial machinery, handbook, english 0216
Infor cloud suite industrial machinery, handbook, english 0216Infor cloud suite industrial machinery, handbook, english 0216
Infor cloud suite industrial machinery, handbook, english 0216
 
Oracle's Top 10 Cloud Predictions
Oracle's Top 10 Cloud PredictionsOracle's Top 10 Cloud Predictions
Oracle's Top 10 Cloud Predictions
 
2013 kepler cheuvreux autumn conference 20130920
2013 kepler cheuvreux autumn conference 201309202013 kepler cheuvreux autumn conference 20130920
2013 kepler cheuvreux autumn conference 20130920
 

Viewers also liked

Lighting Production Gear
Lighting Production GearLighting Production Gear
Lighting Production Gear
Alan Martell
 
Cisco.Press.Metro.Ethernet.E Book Kb
Cisco.Press.Metro.Ethernet.E Book KbCisco.Press.Metro.Ethernet.E Book Kb
Cisco.Press.Metro.Ethernet.E Book Kb
Hussein Elmenshawy
 
Curso ibm 2013 aix
Curso ibm 2013 aixCurso ibm 2013 aix
Curso ibm 2013 aix
camforma
 
Oria - Bespoke Fenestration
Oria - Bespoke FenestrationOria - Bespoke Fenestration
Oria - Bespoke Fenestration
vinayvs
 

Viewers also liked (20)

Informatica ii nebeidis
Informatica ii nebeidisInformatica ii nebeidis
Informatica ii nebeidis
 
La Princesa Embrujada
La Princesa EmbrujadaLa Princesa Embrujada
La Princesa Embrujada
 
Modelo de manuscrito
Modelo de manuscritoModelo de manuscrito
Modelo de manuscrito
 
Image performance talk by Grant Kemp ( @ukdatageek) for London Web Meetup
Image performance talk by Grant Kemp ( @ukdatageek) for London Web MeetupImage performance talk by Grant Kemp ( @ukdatageek) for London Web Meetup
Image performance talk by Grant Kemp ( @ukdatageek) for London Web Meetup
 
I - Marketing/Colombia
I - Marketing/ColombiaI - Marketing/Colombia
I - Marketing/Colombia
 
Lighting Production Gear
Lighting Production GearLighting Production Gear
Lighting Production Gear
 
Cisco.Press.Metro.Ethernet.E Book Kb
Cisco.Press.Metro.Ethernet.E Book KbCisco.Press.Metro.Ethernet.E Book Kb
Cisco.Press.Metro.Ethernet.E Book Kb
 
E cube 7
E cube 7E cube 7
E cube 7
 
Lavasepsis_eng
Lavasepsis_engLavasepsis_eng
Lavasepsis_eng
 
La vuelta al mundo con el marketing turístico - Lic. Mariano Cabrera Lanfranconi
La vuelta al mundo con el marketing turístico - Lic. Mariano Cabrera LanfranconiLa vuelta al mundo con el marketing turístico - Lic. Mariano Cabrera Lanfranconi
La vuelta al mundo con el marketing turístico - Lic. Mariano Cabrera Lanfranconi
 
Pharma Market 23
Pharma Market 23Pharma Market 23
Pharma Market 23
 
2011 pcp boston_changes_in_the_personal_construct_system_during_psychotherapy
2011 pcp boston_changes_in_the_personal_construct_system_during_psychotherapy2011 pcp boston_changes_in_the_personal_construct_system_during_psychotherapy
2011 pcp boston_changes_in_the_personal_construct_system_during_psychotherapy
 
Careerdate - Infomaterial Januar 2015
Careerdate - Infomaterial Januar 2015Careerdate - Infomaterial Januar 2015
Careerdate - Infomaterial Januar 2015
 
Infomedia
InfomediaInfomedia
Infomedia
 
La mia casa dei sogni
La mia casa dei sogniLa mia casa dei sogni
La mia casa dei sogni
 
Curso ibm 2013 aix
Curso ibm 2013 aixCurso ibm 2013 aix
Curso ibm 2013 aix
 
EVOLUCION DEL CELULAR
EVOLUCION DEL CELULAREVOLUCION DEL CELULAR
EVOLUCION DEL CELULAR
 
Oria - Bespoke Fenestration
Oria - Bespoke FenestrationOria - Bespoke Fenestration
Oria - Bespoke Fenestration
 
Nur3052 ch6
Nur3052 ch6Nur3052 ch6
Nur3052 ch6
 
Truco Saldo Recarga Gratis Movistar Lusacell Unefon Mexico
Truco Saldo Recarga Gratis  Movistar Lusacell Unefon MexicoTruco Saldo Recarga Gratis  Movistar Lusacell Unefon Mexico
Truco Saldo Recarga Gratis Movistar Lusacell Unefon Mexico
 

Similar to Dealing with Big Data: Planning for and Surviving the Petabyte Age

Analytics big data ibm
Analytics big data ibmAnalytics big data ibm
Analytics big data ibm
Accenture
 
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
Kun Le
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
pateelhs
 
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
eraser Juan José Calderón
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
DATAVERSITY
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 

Similar to Dealing with Big Data: Planning for and Surviving the Petabyte Age (20)

06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan06. 9534 14985-1-ed b edit dhyan
06. 9534 14985-1-ed b edit dhyan
 
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
 
Analytics big data ibm
Analytics big data ibmAnalytics big data ibm
Analytics big data ibm
 
IBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep diveIBM-Infoworld Big Data deep dive
IBM-Infoworld Big Data deep dive
 
Big Data: Issues and Challenges
Big Data: Issues and ChallengesBig Data: Issues and Challenges
Big Data: Issues and Challenges
 
Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
Unit III.pdf
Unit III.pdfUnit III.pdf
Unit III.pdf
 
Big Data Challenges faced by Organizations
Big Data Challenges faced by OrganizationsBig Data Challenges faced by Organizations
Big Data Challenges faced by Organizations
 
Use of big data technologies in capital markets
Use of big data technologies in capital marketsUse of big data technologies in capital markets
Use of big data technologies in capital markets
 
big data Big Things
big data Big Thingsbig data Big Things
big data Big Things
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
Convergence of AI, IoT, Big Data and Blockchain: A Review. Kefa Rabah .
 
1
11
1
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
 
Big Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to YouBig Data: Beyond the Hype - Why Big Data Matters to You
Big Data: Beyond the Hype - Why Big Data Matters to You
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
IRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its ChallengesIRJET - Big Data Analysis its Challenges
IRJET - Big Data Analysis its Challenges
 

More from Cognizant

More from Cognizant (20)

Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-makingData Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
Data Modernization: Breaking the AI Vicious Cycle for Superior Decision-making
 
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional ExperiencesIt Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
It Takes an Ecosystem: How Technology Companies Deliver Exceptional Experiences
 
Intuition Engineered
Intuition EngineeredIntuition Engineered
Intuition Engineered
 
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
The Work Ahead: Transportation and Logistics Delivering on the Digital-Physic...
 
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital InitiativesEnhancing Desirability: Five Considerations for Winning Digital Initiatives
Enhancing Desirability: Five Considerations for Winning Digital Initiatives
 
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility MandateThe Work Ahead in Manufacturing: Fulfilling the Agility Mandate
The Work Ahead in Manufacturing: Fulfilling the Agility Mandate
 
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 
The Timeline of Next
The Timeline of NextThe Timeline of Next
The Timeline of Next
 
Realising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainRealising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value Chain
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Dealing with Big Data: Planning for and Surviving the Petabyte Age

  • 1. • Cognizant 20-20 Insights Making Sense of Big Data in the Petabyte Age Executive Summary continues to accelerate in terms of volume, complexity and formats.2 The concept of “big data”1 is gaining attention across industries and the globe, thanks to the A traditional approach to handling big data is to growth in social media (Twitter, Facebook, blogs, replace SQL with tools like MapReduce.3 However, etc.) and the explosion of rich content from other the sheer volume of data contained in a data information sources (activity logs from the Web, set that is routinely analyzed does not solve the proximity and wireless sources, etc.). The desire to more pressing issue — that people have difficulty create actionable insights from the ever-increas- focusing on massive amounts of tables, files, Web ing volumes of unstructured and structured sites and data marts, all of which are candidates data sets is forcing enterprises to rethink their for analysis. It’s not all about just data warehouses, approaches to big data, particularly as tradi- anymore. tional approaches have proved difficult — if even possible — to apply to structured data sets. Usability is a factor that will overshadow the technical characteristics of big data analysis for While data volume proliferates, the knowledge it at least the next five years. This paper focuses creates has not kept pace. For example, the sheer specifically on the roadmap organizations must complexity of how to store and index large data create and follow to survive the Petabyte Age. stores, as well as the information models required to access them, have made it difficult for organi- Big Data = Big Challenges zations to convert captured data into insight. The Petabyte Age is creating a multitude of The media appears obsessed with how today’s challenges for organizations. The accelerating leading companies are dealing with big data, a deluge of data is problematic to all, for within phenomenon known as living in the “Petabyte the massive array of data sources — including Age.” However, coverage often focuses on the data warehouses, data marts, portals, Web sites, technology aspects of big data, leaving such social media, files and more — is the informa- concerns as usability largely untouched. tion required to make the smartest strategic business decisions. Many enterprises are facing For years, the accelerating data deluge has also the dilemma that the systems and processes received significant attention from industry devised specifically to integrate all this informa- pundits and researchers. What is new is the tion lack the required responsiveness to place the threshold that has been crossed as the onslaught information into a neatly organized warehouse in cognizant 20-20 insights | june 2011
  • 2. time to be used at the current speed of business. usage and availability and other issues — The heightened use of Excel and other desktop require businesses to store increasingly greater tools to integrate needed information in the most volumes of information for much longer time expedient way possible only adds complexity to frames. the problem enterprises face. As a result of these factors, enterprises worldwide There are a number of factors at the heart of big have been rapidly increasing the amount of data data that make analysis difficult. For example, housed for analysis to compete in the Petabyte the timing of data, the complexity of data, Age. Many have responded by embracing new the complexity of the synthesized enterprise technologies to help manage the sheer volume warehouse and the identification of the most of data. However, these new toys also introduce appropriate data are equal if not larger challenges data usability issues that will not be solved by than dealing with large data sets, themselves. new technology but rather will require some rethinking of the business consequences of big The increased complexity of the data available for data. Among these challenges: generating insights is a direct consequence of the following: • Big data is not only tabular; it also includes documents, e-mails, pictures, videos, sound • The highly communicative and integrated bites, social media extracts, logs and other global economy. Enterprises across industry forms of information that is difficult to fit are increasingly seeking more granular insight into the nicely organized world of traditional into market forces that ultimately shape their database tables (rows and columns). success and failure. Data generated by the “always-on economy” is the impetus, in many • Companies that tackle big data as a tech- nology-only initiative will only solve a single cases, for the keen interest in implementing dimension of the big data mandate. so-called insight facilitation appliances on mobile devices – smartphones, iPads, Android • There are sheer volumetric issues, such and other tablets — throughout the enterprise. as billions of rows of data, that need to be solved. While tried-and-true technologies (par- • The enlightened consumer. Given the titioning) and newer technologies (MapReduce, explosion in social media and smart devices, etc.) permit organizations to segment data into many consumers have more information at more manageable chunks, such an approach their fingertips than ever before (often more does not deal with the issue that rarely so than businesses) and are becoming increas- used information is clogging the pathway to ingly sophisticated in how they gather and necessary information. Traditional lifecycle apply such information. management technologies will alleviate many • The global regulatory climate. New regulatory of the volumetric issues, but they will do little mandates — covering financial transactions, to solve the non-technical issues associated corporate health, food-borne illnesses, energy with volumetrics. Tabulating the Information Lifecycle Types of Information Retained the Longest How Much Information is Retained Development Records 4% >1 PB 4% Email 4% >500 TB 7% Financial Records 5% >100 TB 7% Database Archive 6% >25 TB 4% Government Records 11% >10 TB 14% Organization Records 18% >5 TB 18% Customer Records 19% >1 TB 25% Source Files 25% <1 TB 21% 0% 5% 10% 15% 20% 25% 0% 5% 10% 15% 20% 25% Source: The 100 Year Archive Task Force, SNIA Data Management Forum Figure 1 cognizant 20-20 insights 2
  • 3. As a result of mergers and acquisitions, global > Information required for regulatory activi- sourcing, data types and other issues, the sheer ties but not necessarily related to the cre- number of tables and objects available for access ation, extraction or capture of value. has mushroomed. This increase in the volume of objects has made access schemes for big data > Historical supporting information. overly complex, and it has made finding needed > Historical information that was once aligned data akin to finding a needle in a haystack. with value, regulatory or other purposes but is now kept because it might be useful • The information lifecycle management4 con- at some future date. siderations of data have not received the attention they deserve. Information lifecycle • Much of the complexity of information made available for deriving insight is a management should not be limited to parti- complex weave of multiple versions of the tioning schemes and the physical location of truth, data organized for different purposes, data. (Much attention is being given to cloud data of different vintages and similar data and virtualized storage, which presumes a obtained from different sources. This is a process has been devised for rationalizing the phenomenon that many organizational stake- fact that always-on information made available holders describe as a “black box of informa- in the cloud is worthy of this heightened avail- tion” into which they have no insight into its ability.) Information lifecycle management lineage. This adds delay to the use of insight is the process that traditionally stratifies the gained from such information, a development physical layout for technical performance. In caused by the obvious necessity of validating the Petabyte Age, where the amount of infor- information prior to using it for anything out mation available for analysis is increasing at of the ordinary. Much of the data available for an accelerating rate, the information lifecycle analysis results from the conventional wisdom management process should also ensure a that winning organizations are “data pack heightened focus on information that matters. rats” and that information that arrives on their This stratification should categorize informa- doorsteps tends to stay as a permanent artifact tion into the following groups: of the business. Interestingly, according to the > Information directly related to the creation, 100-Year Archive Task Force,5 source files are extraction or capture of value. the most frequently identified source of data retained as part of the “100-Year Archive.” > Supporting information that could be re- ferred to when devising a strategy to cre- • A sizable amount of operational information ate, extract or capture value. is not housed in official systems adminis- tered by enterprise IT groups but instead is > Information required for the operations of stored on desktops throughout the organi- the enterprise but not necessarily related zation. Much of these local data stores were to the creation, extraction or capture of created with good intentions; people respon- value. Many Sources of Data Partner Data Portal Data Warehouse Taxonomy Dimensions Other Informal ? Information Network Spreadsheets and Local Data Operational Personal System Organization Keys Figure 2 cognizant 20-20 insights 3
  • 4. sible for business functions had no other way The Roadmap to Managing Big Data of gaining control over the information they Big data will be solved through a combination needed. As a result, these desktop-based of enhancements to the people, process and systems have created a different form of big technology strengths of an enterprise. data — a weave of Microsoft Access, Excel and other desktop tool-based sources that are just People-based initiatives that will impact big data as critical in running enterprises. The contents programs employed at companies include the of these individually small to medium-size following: sources of information collectively add up to a sizable number of sources. One large • Managing the information lifecycle employed at the organizations. For good reason (glaring enterprise was found to have close to 1,000 privacy and security concerns, among them), operational metrics managed in desktop appli- organizations have placed significant focus cations (i.e., Excel) — which is not an uncommon on information governance. The mandate for situation. These sources never make it to the determining which data deserves focus should production environment and downstream data be part of the overall governance charter. warehouses. • Much of the big data housed in organiza- • Ensuring a sufficient skill set to tackle the issues introduced as a consequence of big tions results from regulatory requirements data. that necessitate storing large amounts of historical data. This large volume of historical • Developing a series of metrics to manage data hides the data required for insight. While the effectiveness of the big data program. it is important to retain this information, the This includes: necessity to house it with the same priority as information used for deriving insight is both > Average, minimum and maximum time re- quired to turn data into insight. expensive and unnecessary. > Average, minimum and maximum time re- The Case for Horizontal Partitioning quired to integrate new information sources. Horizontal partitioning is the process defined as > Average, minimum and maximum time re- segmenting data in such a way that prioritizes quired to integrate existing information information required for value extraction, origi- sources. nation and capture.6 This partitioning process should result in a process that tiers information > Time required for the management process. along the traditional dimensions of a business > Percentage of people associated with the program participating in the management information model. Such a model enhances the process. focus of information that supports the extraction, origination and capture of value. > Value achieved from the program. Converting Big Data Into Value Relevant Actionable Acquired & Trustworthy Learned Created Knowledge Data Inference Capabilities Customers Markets Channels Value Risks Investors Chain Insight Regulatory Expected Disruptions Outcomes Heard Inference Action Innovation Extracted Originated Value Value Value Captured Captured Transaction Captured Value Value Stream Figure 3 cognizant 20-20 insights 4
  • 5. According to McKinsey,7 the activities of people Big Data Getting Bigger steering big data will include: • Ensuring that big data is more accessible and TB eBay: 6.5 PB, 2009 timely. 1,000 Google: 1 PB of new data every 3 days, 2009 • Measuring the value achieved by unlocking big 800 data. Size of the Largest Data Warehouse in the 600 • Embracing experimentation through data to Winter Top Ten Survey CAGR = 173% expose variability and raise performance. 400 Moore’s Law Growth Rate • Enabling the customization of populations 200 through segmentation. 0 1998 2000 2002 2004 2006 2008 2010 2012 • Facilitating the use of automated algorithms � Actual � Projected to replace and support human decision- making, thereby improving decisions, minimiz- Source: Winter Corp. ing risks and unearthing valuable insights that Figure 4 would otherwise remain hidden. • Facilitating innovation programs that use new Technology-based initiatives that will impact big business models, products and services. data programs employed at companies include: Process-based initiatives that will impact big data programs are best enabled as augmentations of a • Ensuring that the specialized skills required to administer and use the fruits of the big company’s governance activities. These augmen- data initiative are present. Some of these tations include: include the databases, the ontologies used to navigate big data and the MapReduce • Ensuring sufficient focus on information concepts that augment or fully replace SQL that will drive value within the organization. access techniques. These processes are best employed as linkages between corporate strategy and information • Ensuring that tools introduced to navigate lifecycle management programs. big data are usable by the intended audience without upsetting self-service paradigms that > It is important to note that information have slowly gained traction during the past lifecycle management is defined in many several years. organizations as a program to manage hi- erarchies and business continuity. For the • Ensuring that the architecture and sup- purposes of this paper, this definition is ex- porting network, technology and software tended to include promotion and demotion infrastructures are capable of supporting big of data items as aligned with the business data. information model (how information is used It is safe to state that if history is any prediction to support enterprise strategies and tactics) of the future, the sheer volume of data that orga- of the organization. nizations will need to deal with will outstrip our > The process used to govern big data and its collective imaginations for how much data will information lifecycle should continually re- be available for generating insights.. Only eight fine and prioritize the benefits, constraints, years ago, a 300 to 400 terabyte data warehouse priorities and risks associated with the was considered an outlier. Today, multi-petabyte timely publication and use of relevant, fo- warehouses are easily found. Failure to take action cused, actionable and trustworthy informa- to manage the usability of information pouring tion published under big data initiatives. into the enterprise is (and will be) a competitive disadvantage (see Figure 4). • Ensuring the metrics that drive proper adhesion and use of big data are developed. This should cover the following topics: Recommendations Big data is a reality in most enterprises. However, > Governing big data. companies that tackle big data as merely a > Big data lifecycle. technology imperative will solve a less important > Big data use and adoption. dimension of their big data challenges. Big data is much more than an extension of the technolo- > Big data publication metrics. cognizant 20-20 insights 5
  • 6. Storage Definitions to the models (making it difficult for knowledge workers to navigate the data needed for analysis) and added an analytic plaque, which makes finding the required information for analysis more akin to Terabyte Will fit 200,000 photos finding a needle in a haystack. or MP3 songs on a single 1 terabyte hard drive. In many organizations, information lifecycle ini- Petabyte Will fit on 16 Backblaze tiatives are mandated that too often focus on storage pods racked in two data center cabinets. the optimal partitioning and archiving of the enterprise (i.e., vertical partitioning). Largely a Exabyte Will fit in 2,000 cabinets and technology focus, this thread of the information fill a four story data center that takes up a city block. lifecycle overlooks data that is no longer aligned with the strategies, tactics and intentions of the Zettabyte Will fit in 1,000 data centers organization. The scope and breadth of informa- or about 20% of Manhattan, New York. tion housed by enterprises in the Petabyte Age mandates that data be stratified according to its Yottabyte Will fit in the states of Delaware usefulness in the organizational value creation and Rhode Island with a million data centers. (i.e., horizontal partitioning). In today’s organiza- tions, the only cross-functional bodies with the ability to perform this horizontal partitioning are virtual organizations employed to govern the enterprise information asset. Figure 5 It was only a few years ago that a data warehouse gies used in partitioning strategies employed at requiring a terabyte of storage was the exception. enterprises. As we embrace the Petabyte Age, companies are entering an era where they will need to be Companies have proved that they are pack rats. capable of handling and analyzing much larger They need to house large amounts of history for populations of information. Regardless of the advanced analytics, and regulatory pressures processes put in place, ever-increasing volumes influence them to just store everything. The of structured and unstructured data will only reduced cost of storage has allowed companies proliferate, challenging companies to quickly and to turn their data warehousing environments into effectively convert raw data into insight in ways data dumps, which has added both a complexity that stand the test of time. Footnotes 1 Big data refers to data sets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visu- alizing. 2 “The Toxic Terabyte” (IBM Research Labs, July 2006) provides a thorough analysis of how companies had to get their houses in order to deal with a terabyte of data. If this authorative work were rewritten today, it would be called “The Problematic Petabyte” and in five years, most probably, “The Exhaustive Exabyte.” 3 MapReduce is a Google-inspired framework specifically devised for processing large amounts of data optimized across a grid of computing power. 4 Information lifecycle management is a process used to improve the usefulness of data by moving lesser used data into segments. Information lifecycle management is most commonly interested in moving data from always-needed partitions to rarely needed partitions and, finally, into archives. 5 SNIA Data Management Forum’s 100 Year Archive Task Force, 2007. 6 Horizontal partitioning is a term created by the author. It describes the application of generally accepted techniques of gaining performance by segmenting data into partitions (vertical partitioning) to segmenting groups of data by the likelihood of it achieving organizational value. 7 “Big Data, The Next Frontier for Innovation, Competition and Productivity,” McKinsey & Company, May 2011. cognizant 20-20 insights 6
  • 7. About the Author Mark Albala is Director of Cognizant’s North American Business Intelligence and Data Warehousing Consulting Practice. This practice provides solution architecture, information governance, informa- tion strategy and program governance services to companies across industries and supports Cogni- zant’s business intelligence and data warehouse delivery capabilities. A graduate of Syracuse University, Mark has held senior thought leadership, advanced technical and trusted advisory roles for organiza- tions focused on the disciplines of information management for over 20 years. He can be reached at Mark.Albala@cognizant.com. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 111,000 employees as of March 31, 2011, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. Haymarket House #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA 28-29 Haymarket Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London SW1Y 4SP UK Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7321 4888 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7321 4890 Fax: +91 (0) 44 4209 6060 Email: inquiry@cognizant.com Email: infouk@cognizant.com Email: inquiryindia@cognizant.com © Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.