Big Data: Beyond the Hype - Why Big Data Matters to You

DATAVERSITY
DATAVERSITYExecutive Editor at DATAVERSITY à DATAVERSITY
Big Data: Beyond the Hype
Why Big Data Matters to You




W H I T E PA PER
Big Data:
Beyond the Hype
Why Big Data Matters to You




By DataStax Corporation



October 2011
Table of Contents
Introduction......................................................................................................................................4	
  

Big Data and You ............................................................................................................................6	
  

    Big Data Is More Prevalent than You Think ................................................................................6	
  

    Big Data Formats ........................................................................................................................8	
  

    Competitive Advantages Gained Through Big Data ...................................................................9	
  

    Now What?................................................................................................................................11	
  

DataStax Enterprise: The Best Solution for Managing Big Data ...................................................12	
  

    Powered by Apache Cassandra................................................................................................12	
  

    Big Data Analytics .....................................................................................................................14	
  

    Visual Database Management ..................................................................................................16	
  

    Enterprise Production Support and Services ............................................................................17	
  

Conclusion.....................................................................................................................................18	
  

About DataStax .............................................................................................................................19	
  




© 2011 DataStax. All rights reserved.                                                                                                     Page: 3
Introduction
“Big data” is a big buzz phrase in the IT and business world right now – and there are a dizzying
array of opinions on just what these two simple words really mean.

Technology vendors in the legacy database or data warehouse spaces say “big data” simply
refers to a traditional data warehousing scenario involving data volumes in either the single or
multi-terabyte range. Others disagree: They say “big data” isn’t limited to traditional data
warehouse situations, but includes real-time or operational data stores used as the primary data
foundation for online applications that power key external or internal business systems.

It used to be that these transactional/real-time databases were typically “pruned” so they could be
manageable from a data volume standpoint. Their most recent or “hot” data stayed in the
database, and older information was archived to a data warehouse via extract-transform-load
(ETL) routines.

But big data has changed dramatically. The evolution of the web has redefined:

    •    The speed at which information flows into these primary online systems
    •    The number of customers a company must deal with
    •    The acceptable interval between the time that data first enters a system, and its
         transformation into information that can be analyzed to make key business decisions

Some analysts such as Forrester Research and others have attempted to categorize these
changes by what is being called “the four V’s” of big data:

    1. Volume – TB’s to PB’s of data
    2. Velocity – how fast the data is coming in
    3. Variety – all types are now being captured (structured, semi-structured, unstructured)
    4. Value – mining the valuable pieces of data from among data that does not matter

Because of these changes, new definitions for big data have been proposed, with a focus on
technologies to handle such data. Analyst firms such as IDC say legacy RDBMSs designed to run
moderately sized data volumes on single machines do not offer sufficiently powerful engines for
the big data scenarios with which modern businesses are wrestling.




© 2011 DataStax. All rights reserved.                                                          Page: 4
Here’s how IDC defines “big data”:

Big data technologies describe a new generation of technologies and architectures, designed to
economically extract value from very large volumes of a wide variety of data, by enabling high-
                                                 1
velocity capture, discovery, and/or analysis.




This definition incorporates all types of data (e.g., real-time, analytic) managed by next-
generation systems that must scale to handle constantly increasing user workloads and data
volume.

David Kellogg, meanwhile, simply defines big data as being “too big to be reasonably handled by
                                     2
current/traditional technologies.” Consulting and research firm McKinsey & Company agrees with
Kellogg’s concept of big data and defines it as “datasets whose size is beyond the ability of
                                                                                  3
typical database software tools to capture, store, manage, and analyze.”

This paper examines the growing prevalence of big data across nearly every industry; explains
why being good at using and understanding big data is critical for firms that want to compete in
their chosen market; and details how businesses can use DataStax Enterprise—a solution
specifically designed to manage big data easily and effectively—to exploit the benefits derived
from handling big data smartly.




1
    Extracting Value from Chaos, IDC: http://idcdocserv.com/1142
2
    “Big data has jumped the shark,” DBMS 2: www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/
3
 Big Data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, 2011,
p. 11: www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf


© 2011 DataStax. All rights reserved.                                                                 Page: 5
Big Data and You
How much should you care about effectively managing big data? A lot – in fact you’re likely
already dealing with big data without even knowing it.

Big Data Is More Prevalent than You Think
Many businesses believe big data is something only companies like Facebook and Google deal
with. However, a 2011 McKinsey Global Institute study says otherwise.

For instance, the McKinsey report found that the average investment firm with fewer than 1,000
employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per
                                                                   4
year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found
that 15 out of 17 industry sectors in the United States have more data stored per company than
the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s
         5                                                                 6
study) and that companies in all sectors have at least 100 terabytes stored , as Figure 1 shows:




4
    McKinsey, p. 19.
5
    McKinsey, p. vi.


© 2011 DataStax. All rights reserved.                                                     Page: 6
Figure 1: Stored data by industry sector

These data volumes are not confined to enterprise data warehouses that assist only internal
decision makers, but instead exist in the real-time database systems that serve external-facing
customers. And that data continues to grow and expand as the underlying business becomes
more successful. In 2010, the United States alone stored more than 3,500 petabytes of new
               7
information, as shown in Figure 2:




Figure 2: Data stores by geography




6
    McKinsey, p. 19.
7
    McKinsey, p. 103.


© 2011 DataStax. All rights reserved.                                                     Page: 7
Big Data Formats
Part of the need for new technologies for big data (versus older, legacy RDBMSs) has to do with
the format of the data coming in from online applications. A more dynamic, flexible database
schema format is needed to handle the structured, semi-structured, and unstructured data that
                          8
comprises today’s big data (see Figure 3):




Figure 3: Data stores by sector




8
    McKinsey, p. 20.


© 2011 DataStax. All rights reserved.                                                    Page: 8
Competitive Advantages Gained Through Big Data
Today, no one today argues with the fact that a company’s data is its most strategic impersonal
asset. If you don’t use it as a competitive weapon against your market competitors, it’s
guaranteed that you will be at a disadvantage. In a presentation given at the Strata New York
conference in September 2011, McKinsey & Company showed the eye opening, 10-year
category growth rate differences (see Figure 4, below) between businesses that smartly use their
big data and those that do not.




Figure 4: Big data companies have a very real competitive advantage



Clearly, the biggest differences exist between online retailers who don’t use big data well, and
those that do. And today, online retailing is a business every company is in, whether they’re 100
percent web-based or not. Forrester Research predicts that by 2013, over half of all U.S. sales
                       9
will be online in nature (see Figure 5):




9
    McKinsey, p. 66.


© 2011 DataStax. All rights reserved.                                                       Page: 9
Figure 5: Online retail sales growth



The recognition of these market realities has led smart businesses to concentrate on effectively
utilizing their big data. This is reflected in a strong jobs growth trend, as illustrated in the chart
below from Indeed.com:




Figure: 6: Big data job postings


© 2011 DataStax. All rights reserved.                                                          Page: 10
Once data professionals are hired and put to work, there are many different data-driven projects
they can be assigned to. As to the types of processes that can benefit from big data efforts,
McKinsey found many different activities across all core internal corporate functions can provide
                                                         10
value to a modern business through the use of big data        (see in Figure 7):




Figure 7: Big data retail levers




Now What?
The facts about big data really do speak for themselves. The nonstop growth of data, new data
formats that must be managed, and the competitive advantages that come from managing big
data well all underscore why big data should matter to you.




10
     McKinsey, p. 67.


© 2011 DataStax. All rights reserved.                                                     Page: 11
But what should you do about it? If you’re an IT professional, you know how difficult it can be to
find a solution capable of handling a task like big data management that combines the following
benefits:

       •   Scalability
       •   Performance
       •   Ease of use
       •   Low total cost of ownership

Some database offerings may have two of these four features, but it’s rare to find one with all four
– and that delivers on them well. The good news is there is a solution available that confidently
provides checkmarks for the four criteria above.

	
  

DataStax Enterprise:
The Best Solution for Managing Big Data
DataStax is the leading provider of modern enterprise database software products and services
based on Apache Cassandra™. It supports businesses that need a progressive data
management system that can serve as a primary system of record/real-time datastore for critical
production applications, and also deliver built-in analytic capabilities for analyzing that data once
it’s in Cassandra.

DataStax Enterprise is tailor-made to manage big data effectively. The solution inherits all of
Cassandra’s powerful feature set for servicing modern real-time applications, and merges in a
fault-tolerant, analytics platform that provides Hadoop MapReduce, Hive, and Pig support for
business intelligence systems.

A key differentiator of DataStax Enterprise over other big data providers is that real-time and
analytic workloads are smartly separated across a distributed DataStax Enterprise database
cluster, so that no competition for underlying compute resources or data occurs.

Powered by Apache Cassandra
The foundation that enables DataStax Enterprise to tackle big data is Apache Cassandra™.
Cassandra enjoys an industry reputation for being the only NoSQL database solution able to truly
handle big data requirements. It’s a highly scalable and high-performance distributed database
management system that can handle real-time big data applications that drive key systems for
modern and successful businesses.


© 2011 DataStax. All rights reserved.                                                        Page: 12
Key technical differentiators of Cassandra versus its legacy RDBMS predecessors, as well as
other NoSQL offerings, include:

    •   A built-for-scale architecture that can handle petabytes of information and thousands of
        concurrent users/operations per second as easily as it can manage much smaller
        amounts of data and user traffic
    •   Peer-to-peer design that offers no single point of failure for any database process or
        function
    •   Online capacity additions that deliver linear performance gains for both read and write
        operations
    •   Read/write anywhere capabilities that equate to a true network-independent method of
        storing and accessing data
    •   Tunable data consistency that allows Cassandra to offer the data durability and
        protection like an RDBMS, but with the flexible choice of relaxing data consistency when
        application use cases allow
    •   Flexible/dynamic schema design that accommodates all formats of big data applications,
        including structured, semi-structured, and unstructured data; data is represented in
        Cassandra via column families that are dynamic in nature and accommodate all
        modifications online
    •   Simplified replication that provides data redundancy and is capable of being multi-data
        center and cloud in nature
    •   Data compression that reduces the footprint of raw big data by over 80 percent in some
        use cases
    •   An SQL-like language (CQL) that lessens the learning curve for developers and
        administrators coming from the RDBMS world
    •   Support for key developer languages (e.g., Java) and operating systems
    •   No requirement for any special equipment; Cassandra runs on commodity hardware

Cassandra is built with the assumption that failures can and will occur in a big data infrastructure.
Therefore, data redundancy to protect against hardware failure and other data loss scenarios is
built into and managed transparently by Cassandra. Furthermore, this capability can be
configured so that big data applications can use a single large database that is distributed across
multiple, geographically dispersed data centers, between different physical racks in a data center,
and between public cloud providers and on-premise managed data centers.




© 2011 DataStax. All rights reserved.                                                       Page: 13
Figure 8: Cassandra multi-data center capabilities



These and other capabilities make Cassandra and DataStax Enterprise the smart choice for
modern businesses whose big data management needs have outgrown their traditional RDBMS
software.

Big Data Analytics
A primary benefit that DataStax Enterprise provides to enterprises needing smart big data
management capabilities is its ability to service both real-time and analytic data operations in the
same database cluster without either load impacting the other. The key to making this possible is
the underlying architecture of Cassandra.

Built into DataStax Enterprise is an enhanced Hadoop distribution that utilizes Cassandra for
many of its core services. DataStax Enterprise provides integrated Hadoop MapReduce, Hive,
Pig, and job/task tracking capabilities, replacing Hadoop’s HDFS storage layer with Cassandra
(CassandraFS). The end product is a single integrated solution that provides increased reliability,
simpler deployment, and lower TCO than a traditional Hadoop solution. DataStax Enterprise also
is fully compatible with existing HDFS, Hadoop, and Hive tools and utilities.

Another benefit of DataStax Enterprise is that it eliminates the complexity and single points of
failure of the typical Hadoop HDFS layer. From an operational standpoint, there is no need to set
up a Hadoop name node, secondary name node, Zookeeper, and so on.




© 2011 DataStax. All rights reserved.                                                      Page: 14
Instead, DataStax Enterprise provides a single layer in which every node is a peer of the others
and automatically knows its position in the cluster. On startup, all DataStax Enterprise nodes
automatically start a Hadoop task tracker, and one of the nodes is elected to be the job tracker. If
the job tracker node fails, the job tracker is automatically restarted on a different node. DataStax
Enterprise utilizes full data locality awareness for Hadoop task assignment.




Figure 9: DataStax Enterprise – real-time and analytic data in one database



Another key benefit of DataStax Enterprise is the tight feedback loop it has between real-time
application and the analytics that naturally follow. Traditionally, users would be forced to move
data between systems via complex ETL processes, or perform both functions on the same
system with the risk of one impacting the other. In big data environments, this process can be
time-consuming and burdensome.

With DataStax Enterprise, both real-time and analytic big data operations take place in the same
distributed system, but users have the ability to dedicate certain nodes solely for analytics so that
workload doesn’t slow down real-time processing.

Users simply define one or more replica groups, and configure the role of each—one or more
Cassandra, Hadoop or HDFS (i.e., HDFS without job/task tracker) nodes. Writes are instantly
replicated between both real-time and analytic nodes.




© 2011 DataStax. All rights reserved.                                                       Page: 15
With DataStax Enterprise, users truly have the best of both worlds for big data management.
They have all the power of Cassandra serving their highest-volume and high-velocity real-time
applications, and the power of Hadoop, Hive, and Pig working directly against the same data for
analytics. The result is smart workload management for big data application, which is much
simpler to manage and more reliable than any of the alternatives.

Visual Database Management
DataStax Enterprise includes a visual, browser-based management solution named OpsCenter
Enterprise. OpsCenter Enterprise allows a developer or administrator to manage and monitor the
health of big data clusters from a centralized web console.




Figure 10: OpsCenter Enterprise database cluster ring view



OpsCenter Enterprise uses an agent-based architecture to monitor and carry out tasks on each
node in a DataStax Enterprise cluster. Through a graphical and intuitive point-and-click interface,
a user can understand the state of a cluster, which nodes are up and down, and what type of
performance users are experiencing. Key events are reported into a centralized dashboard
displayed along with other vital statistics.




© 2011 DataStax. All rights reserved.                                                     Page: 16
Figure 11: OpsCenter dashboard


Analytic operations also can be monitored and controlled from within OpsCenter Enterprise:




Figure 12: OpsCenter analytic operations monitoring




Enterprise Production Support and Services
Big data situations often require fast access to skilled expertise. DataStax Enterprise includes
experienced production support and consultative services from Cassandra experts. You can


© 2011 DataStax. All rights reserved.                                                      Page: 17
choose the right production support package for your business needs, including rapid response
SLAs and consultative help.

DataStax also provides certified quarterly service pack updates for DataStax Enterprise, as well
as other benefits such as emergency hot fixes (for production outages) and bug escalation
privileges.

Additionally, DataStax offers professional big data training on Cassandra and Hadoop, with
classes offered in many major cities as well as on-site for corporations that need many staff
members trained at once.




Conclusion
Big data isn’t just hype—and it’s much more than a buzz phrase. Today, companies across
industries are finding they not only need to manage increasingly large data volumes in their real-
time systems, but also analyze that information so they can make the right decisions, fast, to
compete effectively in the market.

Modern businesses looking for a solution to handle their big data easily and effectively should
consider DataStax Enterprise. Its scale-out architecture comfortably scales into the petabytes
data range, while offering high performance for reads and writes no matter the data volume size.

DataStax Enterprise also differs from competitors by providing a single integrated database
platform that smartly manages both real-time and analytic data. Additionally, DataStax Enterprise
does all of this at a fraction of the cost charged by traditional RDBMS vendors.

To find out more about DataStax Enterprise and obtain trial software, please visit
www.datastax.com or email info@datastax.com.




© 2011 DataStax. All rights reserved.                                                     Page: 18
About DataStax
DataStax is the developer of DataStax Enterprise, a distributed, scalable, and highly
available database platform that delivers optimal performance either on premise or in the cloud
for modern enterprise applications that manage both real-time and analytic workloads. The
company has over 100 customers, including leaders such as Netflix, Cisco, Rackspace and
Constant Contact, and spanning verticals including web, financial services, telecommunications,
logistics and government.

DataStax is backed by industry leading investors, including Lightspeed Venture Partners and
Crosslink Capital and is based in Burlingame, CA with offices in Austin, TX and Stamford, CT.

For more information, visit www.datastax.com.




© 2011 DataStax. All rights reserved.                                                   Page: 19

Recommandé

Responding to a Data Breach, Communications Guidelines for Merchants par
Responding to a Data Breach, Communications Guidelines for MerchantsResponding to a Data Breach, Communications Guidelines for Merchants
Responding to a Data Breach, Communications Guidelines for Merchants- Mark - Fullbright
897 vues16 diapositives
Looking Forward - Regulators and Data Incidents par
Looking Forward - Regulators and Data IncidentsLooking Forward - Regulators and Data Incidents
Looking Forward - Regulators and Data IncidentsResilient Systems
469 vues26 diapositives
Master Data in the Cloud: 5 Security Fundamentals par
Master Data in the Cloud: 5 Security FundamentalsMaster Data in the Cloud: 5 Security Fundamentals
Master Data in the Cloud: 5 Security FundamentalsSarah Fane
109 vues6 diapositives
Protect your confidential information while improving services par
Protect your confidential information while improving servicesProtect your confidential information while improving services
Protect your confidential information while improving servicesCloudMask inc.
97 vues5 diapositives
A data-centric program par
A data-centric program A data-centric program
A data-centric program at MicroFocus Italy ❖✔
414 vues8 diapositives
Threat Ready Data: Protect Data from the Inside and the Outside par
Threat Ready Data: Protect Data from the Inside and the OutsideThreat Ready Data: Protect Data from the Inside and the Outside
Threat Ready Data: Protect Data from the Inside and the OutsideDLT Solutions
748 vues33 diapositives

Contenu connexe

Tendances

Cashing in on the public cloud with total confidence par
Cashing in on the public cloud with total confidenceCashing in on the public cloud with total confidence
Cashing in on the public cloud with total confidenceCloudMask inc.
98 vues4 diapositives
The Rise of Data Ethics and Security - AIDI Webinar par
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarEryk Budi Pratama
919 vues29 diapositives
The Art of Cloud Auditing - ISACA ID par
The Art of Cloud Auditing - ISACA IDThe Art of Cloud Auditing - ISACA ID
The Art of Cloud Auditing - ISACA IDEryk Budi Pratama
2.2K vues38 diapositives
DAMA Webinar: The Data Governance of Personal (PII) Data par
DAMA Webinar: The Data Governance of  Personal (PII) DataDAMA Webinar: The Data Governance of  Personal (PII) Data
DAMA Webinar: The Data Governance of Personal (PII) DataDATAVERSITY
4.3K vues31 diapositives
ebook.driving decision-making, security par
ebook.driving decision-making, securityebook.driving decision-making, security
ebook.driving decision-making, securityRoman Chanclor
112 vues25 diapositives
Data Breach Guide 2013 par
Data Breach Guide 2013Data Breach Guide 2013
Data Breach Guide 2013- Mark - Fullbright
4.6K vues33 diapositives

Tendances(19)

Cashing in on the public cloud with total confidence par CloudMask inc.
Cashing in on the public cloud with total confidenceCashing in on the public cloud with total confidence
Cashing in on the public cloud with total confidence
CloudMask inc.98 vues
The Rise of Data Ethics and Security - AIDI Webinar par Eryk Budi Pratama
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
DAMA Webinar: The Data Governance of Personal (PII) Data par DATAVERSITY
DAMA Webinar: The Data Governance of  Personal (PII) DataDAMA Webinar: The Data Governance of  Personal (PII) Data
DAMA Webinar: The Data Governance of Personal (PII) Data
DATAVERSITY4.3K vues
ebook.driving decision-making, security par Roman Chanclor
ebook.driving decision-making, securityebook.driving decision-making, security
ebook.driving decision-making, security
Roman Chanclor112 vues
Secure dataroom whitepaper_protecting_confidential_documents par e.law International
Secure dataroom whitepaper_protecting_confidential_documentsSecure dataroom whitepaper_protecting_confidential_documents
Secure dataroom whitepaper_protecting_confidential_documents
Cross border - off-shoring and outsourcing privacy sensitive data par Ulf Mattsson
Cross border - off-shoring and outsourcing privacy sensitive dataCross border - off-shoring and outsourcing privacy sensitive data
Cross border - off-shoring and outsourcing privacy sensitive data
Ulf Mattsson1.3K vues
Michael Josephs par daveGBE
Michael JosephsMichael Josephs
Michael Josephs
daveGBE320 vues
Opteamix_whitepaper_Data Masking Strategy.pdf par Opteamix LLC
Opteamix_whitepaper_Data Masking Strategy.pdfOpteamix_whitepaper_Data Masking Strategy.pdf
Opteamix_whitepaper_Data Masking Strategy.pdf
Opteamix LLC23 vues
Building the Information Governance Business Case Within Your Company par AIIM International
Building the Information Governance Business Case Within Your CompanyBuilding the Information Governance Business Case Within Your Company
Building the Information Governance Business Case Within Your Company
AIIM International2.4K vues
Achieving Regulatory Compliance The Devil Is In The Data Governance V2 par Ken O'Connor
Achieving Regulatory Compliance   The Devil Is In The Data Governance V2Achieving Regulatory Compliance   The Devil Is In The Data Governance V2
Achieving Regulatory Compliance The Devil Is In The Data Governance V2
Ken O'Connor869 vues
Managed Security For A Not So Secure World Wp090991 par Erik Ginalick
Managed Security For A Not So Secure World Wp090991Managed Security For A Not So Secure World Wp090991
Managed Security For A Not So Secure World Wp090991
Erik Ginalick334 vues
Novetta Entity Analytics par Novetta
Novetta Entity AnalyticsNovetta Entity Analytics
Novetta Entity Analytics
Novetta1.1K vues
ZoomLens - Loveland, Subramanian -Tackling Info Risk par John Loveland
ZoomLens - Loveland, Subramanian -Tackling Info RiskZoomLens - Loveland, Subramanian -Tackling Info Risk
ZoomLens - Loveland, Subramanian -Tackling Info Risk
John Loveland183 vues

En vedette

Mobile / Tablet Application Development - What are my options? par
Mobile / Tablet Application Development - What are my options?Mobile / Tablet Application Development - What are my options?
Mobile / Tablet Application Development - What are my options?DATA Inc.
545 vues17 diapositives
Future city par
Future cityFuture city
Future cityStanley Jia Cheng
1.9K vues36 diapositives
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013 par
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013Sap commitment to_open_data_acces_strategy_for_bi_sept_2013
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013Atul Patel
1.5K vues23 diapositives
Big data for the rest of us par
Big data for the rest of usBig data for the rest of us
Big data for the rest of usSteven Francia
3.6K vues47 diapositives
Where data security and value of data meet in the cloud ulf mattsson par
Where data security and value of data meet in the cloud   ulf mattssonWhere data security and value of data meet in the cloud   ulf mattsson
Where data security and value of data meet in the cloud ulf mattssonUlf Mattsson
580 vues65 diapositives
Practical advice for cloud data protection ulf mattsson - jun 2014 par
Practical advice for cloud data protection   ulf mattsson - jun 2014Practical advice for cloud data protection   ulf mattsson - jun 2014
Practical advice for cloud data protection ulf mattsson - jun 2014Ulf Mattsson
768 vues102 diapositives

En vedette(8)

Mobile / Tablet Application Development - What are my options? par DATA Inc.
Mobile / Tablet Application Development - What are my options?Mobile / Tablet Application Development - What are my options?
Mobile / Tablet Application Development - What are my options?
DATA Inc. 545 vues
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013 par Atul Patel
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013Sap commitment to_open_data_acces_strategy_for_bi_sept_2013
Sap commitment to_open_data_acces_strategy_for_bi_sept_2013
Atul Patel1.5K vues
Where data security and value of data meet in the cloud ulf mattsson par Ulf Mattsson
Where data security and value of data meet in the cloud   ulf mattssonWhere data security and value of data meet in the cloud   ulf mattsson
Where data security and value of data meet in the cloud ulf mattsson
Ulf Mattsson580 vues
Practical advice for cloud data protection ulf mattsson - jun 2014 par Ulf Mattsson
Practical advice for cloud data protection   ulf mattsson - jun 2014Practical advice for cloud data protection   ulf mattsson - jun 2014
Practical advice for cloud data protection ulf mattsson - jun 2014
Ulf Mattsson768 vues
Public cloud data protection par Ulf Mattsson
Public cloud data protectionPublic cloud data protection
Public cloud data protection
Ulf Mattsson452 vues
Time to re think our security process par Ulf Mattsson
Time to re think our security processTime to re think our security process
Time to re think our security process
Ulf Mattsson444 vues

Similaire à Big Data: Beyond the Hype - Why Big Data Matters to You

Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen... par
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Dr. Cedric Alford
663 vues11 diapositives
Use of big data technologies in capital markets par
Use of big data technologies in capital marketsUse of big data technologies in capital markets
Use of big data technologies in capital marketsInfosys
2.5K vues8 diapositives
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC... par
LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC...ijdpsjournal
5 vues13 diapositives
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
51 vues13 diapositives
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...ijdpsjournal
20 vues13 diapositives
Big Data: Where is the Real Opportunity? par
Big Data: Where is the Real Opportunity?Big Data: Where is the Real Opportunity?
Big Data: Where is the Real Opportunity?Cartesian (formerly CSMG)
764 vues7 diapositives

Similaire à Big Data: Beyond the Hype - Why Big Data Matters to You(20)

Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen... par Dr. Cedric Alford
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Data Mining: The Top 3 Things You Need to Know to Achieve Business Improvemen...
Use of big data technologies in capital markets par Infosys
Use of big data technologies in capital marketsUse of big data technologies in capital markets
Use of big data technologies in capital markets
Infosys2.5K vues
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...LEVERAGING CLOUD BASED BIG DATA ANALYTICS  IN KNOWLEDGE MANAGEMENT FOR ENHANC...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANC...
ijdpsjournal5 vues
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal51 vues
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE... par ijdpsjournal
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
LEVERAGING CLOUD BASED BIG DATA ANALYTICS IN KNOWLEDGE MANAGEMENT FOR ENHANCE...
ijdpsjournal20 vues
Ab cs of big data par Digimark
Ab cs of big dataAb cs of big data
Ab cs of big data
Digimark230 vues
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO par Robin Beregovska
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIOA REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
A REVIEW STUDY ON BIG DATA ANALYSIS USING R STUDIO
Oea big-data-guide-1522052 par kavi172
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
kavi1721.2K vues
Big Data Analytics Research Report par Ila Group
Big Data Analytics Research ReportBig Data Analytics Research Report
Big Data Analytics Research Report
Ila Group1K vues
Big Data is Here for Financial Services White Paper par Experian
Big Data is Here for Financial Services White PaperBig Data is Here for Financial Services White Paper
Big Data is Here for Financial Services White Paper
Experian579 vues
What's the Big Deal About Big Data? par Logi Analytics
What's the Big Deal About Big Data?What's the Big Deal About Big Data?
What's the Big Deal About Big Data?
Logi Analytics550 vues
Move It Don't Lose It: Is Your Big Data Collecting Dust? par Jennifer Walker
Move It Don't Lose It: Is Your Big Data Collecting Dust?Move It Don't Lose It: Is Your Big Data Collecting Dust?
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Jennifer Walker324 vues

Plus de DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le... par
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
160 vues39 diapositives
Data at the Speed of Business with Data Mastering and Governance par
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
483 vues25 diapositives
Exploring Levels of Data Literacy par
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
446 vues50 diapositives
Building a Data Strategy – Practical Steps for Aligning with Business Goals par
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
1K vues63 diapositives
Make Data Work for You par
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
331 vues21 diapositives
Data Catalogs Are the Answer – What is the Question? par
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
507 vues50 diapositives

Plus de DATAVERSITY(20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le... par DATAVERSITY
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY160 vues
Data at the Speed of Business with Data Mastering and Governance par DATAVERSITY
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY483 vues
Exploring Levels of Data Literacy par DATAVERSITY
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY446 vues
Building a Data Strategy – Practical Steps for Aligning with Business Goals par DATAVERSITY
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY1K vues
Make Data Work for You par DATAVERSITY
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY331 vues
Data Catalogs Are the Answer – What is the Question? par DATAVERSITY
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY507 vues
Data Catalogs Are the Answer – What Is the Question? par DATAVERSITY
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY107 vues
Data Modeling Fundamentals par DATAVERSITY
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY829 vues
Showing ROI for Your Analytic Project par DATAVERSITY
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY184 vues
How a Semantic Layer Makes Data Mesh Work at Scale par DATAVERSITY
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY795 vues
Is Enterprise Data Literacy Possible? par DATAVERSITY
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY471 vues
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re... par DATAVERSITY
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY315 vues
Emerging Trends in Data Architecture – What’s the Next Big Thing? par DATAVERSITY
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY545 vues
Data Governance Trends - A Look Backwards and Forwards par DATAVERSITY
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY469 vues
Data Governance Trends and Best Practices To Implement Today par DATAVERSITY
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY756 vues
2023 Trends in Enterprise Analytics par DATAVERSITY
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY333 vues
Who Should Own Data Governance – IT or Business? par DATAVERSITY
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY418 vues
Data Management Best Practices par DATAVERSITY
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY703 vues
MLOps – Applying DevOps to Competitive Advantage par DATAVERSITY
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY211 vues
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D... par DATAVERSITY
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
Keeping the Pulse of Your Data – Why You Need Data Observability to Improve D...
DATAVERSITY392 vues

Dernier

2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue par
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlueShapeBlue
75 vues23 diapositives
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
138 vues8 diapositives
Future of AR - Facebook Presentation par
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook PresentationRob McCarty
54 vues27 diapositives
"Surviving highload with Node.js", Andrii Shumada par
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
49 vues29 diapositives
20231123_Camunda Meetup Vienna.pdf par
20231123_Camunda Meetup Vienna.pdf20231123_Camunda Meetup Vienna.pdf
20231123_Camunda Meetup Vienna.pdfPhactum Softwareentwicklung GmbH
49 vues73 diapositives
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
40 vues52 diapositives

Dernier(20)

2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue par ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue75 vues
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue138 vues
Future of AR - Facebook Presentation par Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty54 vues
"Surviving highload with Node.js", Andrii Shumada par Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays49 vues
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive par Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... par ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue69 vues
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue par ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue191 vues
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue97 vues
State of the Union - Rohit Yadav - Apache CloudStack par ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue218 vues
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... par Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 vues
Why and How CloudStack at weSystems - Stephan Bienek - weSystems par ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue172 vues
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... par ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue105 vues
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool par ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue56 vues
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... par ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue48 vues
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... par ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue74 vues

Big Data: Beyond the Hype - Why Big Data Matters to You

  • 1. Big Data: Beyond the Hype Why Big Data Matters to You W H I T E PA PER
  • 2. Big Data: Beyond the Hype Why Big Data Matters to You By DataStax Corporation October 2011
  • 3. Table of Contents Introduction......................................................................................................................................4   Big Data and You ............................................................................................................................6   Big Data Is More Prevalent than You Think ................................................................................6   Big Data Formats ........................................................................................................................8   Competitive Advantages Gained Through Big Data ...................................................................9   Now What?................................................................................................................................11   DataStax Enterprise: The Best Solution for Managing Big Data ...................................................12   Powered by Apache Cassandra................................................................................................12   Big Data Analytics .....................................................................................................................14   Visual Database Management ..................................................................................................16   Enterprise Production Support and Services ............................................................................17   Conclusion.....................................................................................................................................18   About DataStax .............................................................................................................................19   © 2011 DataStax. All rights reserved. Page: 3
  • 4. Introduction “Big data” is a big buzz phrase in the IT and business world right now – and there are a dizzying array of opinions on just what these two simple words really mean. Technology vendors in the legacy database or data warehouse spaces say “big data” simply refers to a traditional data warehousing scenario involving data volumes in either the single or multi-terabyte range. Others disagree: They say “big data” isn’t limited to traditional data warehouse situations, but includes real-time or operational data stores used as the primary data foundation for online applications that power key external or internal business systems. It used to be that these transactional/real-time databases were typically “pruned” so they could be manageable from a data volume standpoint. Their most recent or “hot” data stayed in the database, and older information was archived to a data warehouse via extract-transform-load (ETL) routines. But big data has changed dramatically. The evolution of the web has redefined: • The speed at which information flows into these primary online systems • The number of customers a company must deal with • The acceptable interval between the time that data first enters a system, and its transformation into information that can be analyzed to make key business decisions Some analysts such as Forrester Research and others have attempted to categorize these changes by what is being called “the four V’s” of big data: 1. Volume – TB’s to PB’s of data 2. Velocity – how fast the data is coming in 3. Variety – all types are now being captured (structured, semi-structured, unstructured) 4. Value – mining the valuable pieces of data from among data that does not matter Because of these changes, new definitions for big data have been proposed, with a focus on technologies to handle such data. Analyst firms such as IDC say legacy RDBMSs designed to run moderately sized data volumes on single machines do not offer sufficiently powerful engines for the big data scenarios with which modern businesses are wrestling. © 2011 DataStax. All rights reserved. Page: 4
  • 5. Here’s how IDC defines “big data”: Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high- 1 velocity capture, discovery, and/or analysis. This definition incorporates all types of data (e.g., real-time, analytic) managed by next- generation systems that must scale to handle constantly increasing user workloads and data volume. David Kellogg, meanwhile, simply defines big data as being “too big to be reasonably handled by 2 current/traditional technologies.” Consulting and research firm McKinsey & Company agrees with Kellogg’s concept of big data and defines it as “datasets whose size is beyond the ability of 3 typical database software tools to capture, store, manage, and analyze.” This paper examines the growing prevalence of big data across nearly every industry; explains why being good at using and understanding big data is critical for firms that want to compete in their chosen market; and details how businesses can use DataStax Enterprise—a solution specifically designed to manage big data easily and effectively—to exploit the benefits derived from handling big data smartly. 1 Extracting Value from Chaos, IDC: http://idcdocserv.com/1142 2 “Big data has jumped the shark,” DBMS 2: www.dbms2.com/2011/09/11/big-data-has-jumped-the-shark/ 3 Big Data: The next frontier for innovation, competition, and productivity, McKinsey Global Institute, 2011, p. 11: www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf © 2011 DataStax. All rights reserved. Page: 5
  • 6. Big Data and You How much should you care about effectively managing big data? A lot – in fact you’re likely already dealing with big data without even knowing it. Big Data Is More Prevalent than You Think Many businesses believe big data is something only companies like Facebook and Google deal with. However, a 2011 McKinsey Global Institute study says otherwise. For instance, the McKinsey report found that the average investment firm with fewer than 1,000 employees has 3.8 petabytes of data stored, experiences a data growth rate of 40 percent per 4 year, and stores structured, semi-structured, and unstructured data. Overall, McKinsey found that 15 out of 17 industry sectors in the United States have more data stored per company than the U.S. Library of Congress (which had 235 terabytes of information at the time of McKinsey’s 5 6 study) and that companies in all sectors have at least 100 terabytes stored , as Figure 1 shows: 4 McKinsey, p. 19. 5 McKinsey, p. vi. © 2011 DataStax. All rights reserved. Page: 6
  • 7. Figure 1: Stored data by industry sector These data volumes are not confined to enterprise data warehouses that assist only internal decision makers, but instead exist in the real-time database systems that serve external-facing customers. And that data continues to grow and expand as the underlying business becomes more successful. In 2010, the United States alone stored more than 3,500 petabytes of new 7 information, as shown in Figure 2: Figure 2: Data stores by geography 6 McKinsey, p. 19. 7 McKinsey, p. 103. © 2011 DataStax. All rights reserved. Page: 7
  • 8. Big Data Formats Part of the need for new technologies for big data (versus older, legacy RDBMSs) has to do with the format of the data coming in from online applications. A more dynamic, flexible database schema format is needed to handle the structured, semi-structured, and unstructured data that 8 comprises today’s big data (see Figure 3): Figure 3: Data stores by sector 8 McKinsey, p. 20. © 2011 DataStax. All rights reserved. Page: 8
  • 9. Competitive Advantages Gained Through Big Data Today, no one today argues with the fact that a company’s data is its most strategic impersonal asset. If you don’t use it as a competitive weapon against your market competitors, it’s guaranteed that you will be at a disadvantage. In a presentation given at the Strata New York conference in September 2011, McKinsey & Company showed the eye opening, 10-year category growth rate differences (see Figure 4, below) between businesses that smartly use their big data and those that do not. Figure 4: Big data companies have a very real competitive advantage Clearly, the biggest differences exist between online retailers who don’t use big data well, and those that do. And today, online retailing is a business every company is in, whether they’re 100 percent web-based or not. Forrester Research predicts that by 2013, over half of all U.S. sales 9 will be online in nature (see Figure 5): 9 McKinsey, p. 66. © 2011 DataStax. All rights reserved. Page: 9
  • 10. Figure 5: Online retail sales growth The recognition of these market realities has led smart businesses to concentrate on effectively utilizing their big data. This is reflected in a strong jobs growth trend, as illustrated in the chart below from Indeed.com: Figure: 6: Big data job postings © 2011 DataStax. All rights reserved. Page: 10
  • 11. Once data professionals are hired and put to work, there are many different data-driven projects they can be assigned to. As to the types of processes that can benefit from big data efforts, McKinsey found many different activities across all core internal corporate functions can provide 10 value to a modern business through the use of big data (see in Figure 7): Figure 7: Big data retail levers Now What? The facts about big data really do speak for themselves. The nonstop growth of data, new data formats that must be managed, and the competitive advantages that come from managing big data well all underscore why big data should matter to you. 10 McKinsey, p. 67. © 2011 DataStax. All rights reserved. Page: 11
  • 12. But what should you do about it? If you’re an IT professional, you know how difficult it can be to find a solution capable of handling a task like big data management that combines the following benefits: • Scalability • Performance • Ease of use • Low total cost of ownership Some database offerings may have two of these four features, but it’s rare to find one with all four – and that delivers on them well. The good news is there is a solution available that confidently provides checkmarks for the four criteria above.   DataStax Enterprise: The Best Solution for Managing Big Data DataStax is the leading provider of modern enterprise database software products and services based on Apache Cassandra™. It supports businesses that need a progressive data management system that can serve as a primary system of record/real-time datastore for critical production applications, and also deliver built-in analytic capabilities for analyzing that data once it’s in Cassandra. DataStax Enterprise is tailor-made to manage big data effectively. The solution inherits all of Cassandra’s powerful feature set for servicing modern real-time applications, and merges in a fault-tolerant, analytics platform that provides Hadoop MapReduce, Hive, and Pig support for business intelligence systems. A key differentiator of DataStax Enterprise over other big data providers is that real-time and analytic workloads are smartly separated across a distributed DataStax Enterprise database cluster, so that no competition for underlying compute resources or data occurs. Powered by Apache Cassandra The foundation that enables DataStax Enterprise to tackle big data is Apache Cassandra™. Cassandra enjoys an industry reputation for being the only NoSQL database solution able to truly handle big data requirements. It’s a highly scalable and high-performance distributed database management system that can handle real-time big data applications that drive key systems for modern and successful businesses. © 2011 DataStax. All rights reserved. Page: 12
  • 13. Key technical differentiators of Cassandra versus its legacy RDBMS predecessors, as well as other NoSQL offerings, include: • A built-for-scale architecture that can handle petabytes of information and thousands of concurrent users/operations per second as easily as it can manage much smaller amounts of data and user traffic • Peer-to-peer design that offers no single point of failure for any database process or function • Online capacity additions that deliver linear performance gains for both read and write operations • Read/write anywhere capabilities that equate to a true network-independent method of storing and accessing data • Tunable data consistency that allows Cassandra to offer the data durability and protection like an RDBMS, but with the flexible choice of relaxing data consistency when application use cases allow • Flexible/dynamic schema design that accommodates all formats of big data applications, including structured, semi-structured, and unstructured data; data is represented in Cassandra via column families that are dynamic in nature and accommodate all modifications online • Simplified replication that provides data redundancy and is capable of being multi-data center and cloud in nature • Data compression that reduces the footprint of raw big data by over 80 percent in some use cases • An SQL-like language (CQL) that lessens the learning curve for developers and administrators coming from the RDBMS world • Support for key developer languages (e.g., Java) and operating systems • No requirement for any special equipment; Cassandra runs on commodity hardware Cassandra is built with the assumption that failures can and will occur in a big data infrastructure. Therefore, data redundancy to protect against hardware failure and other data loss scenarios is built into and managed transparently by Cassandra. Furthermore, this capability can be configured so that big data applications can use a single large database that is distributed across multiple, geographically dispersed data centers, between different physical racks in a data center, and between public cloud providers and on-premise managed data centers. © 2011 DataStax. All rights reserved. Page: 13
  • 14. Figure 8: Cassandra multi-data center capabilities These and other capabilities make Cassandra and DataStax Enterprise the smart choice for modern businesses whose big data management needs have outgrown their traditional RDBMS software. Big Data Analytics A primary benefit that DataStax Enterprise provides to enterprises needing smart big data management capabilities is its ability to service both real-time and analytic data operations in the same database cluster without either load impacting the other. The key to making this possible is the underlying architecture of Cassandra. Built into DataStax Enterprise is an enhanced Hadoop distribution that utilizes Cassandra for many of its core services. DataStax Enterprise provides integrated Hadoop MapReduce, Hive, Pig, and job/task tracking capabilities, replacing Hadoop’s HDFS storage layer with Cassandra (CassandraFS). The end product is a single integrated solution that provides increased reliability, simpler deployment, and lower TCO than a traditional Hadoop solution. DataStax Enterprise also is fully compatible with existing HDFS, Hadoop, and Hive tools and utilities. Another benefit of DataStax Enterprise is that it eliminates the complexity and single points of failure of the typical Hadoop HDFS layer. From an operational standpoint, there is no need to set up a Hadoop name node, secondary name node, Zookeeper, and so on. © 2011 DataStax. All rights reserved. Page: 14
  • 15. Instead, DataStax Enterprise provides a single layer in which every node is a peer of the others and automatically knows its position in the cluster. On startup, all DataStax Enterprise nodes automatically start a Hadoop task tracker, and one of the nodes is elected to be the job tracker. If the job tracker node fails, the job tracker is automatically restarted on a different node. DataStax Enterprise utilizes full data locality awareness for Hadoop task assignment. Figure 9: DataStax Enterprise – real-time and analytic data in one database Another key benefit of DataStax Enterprise is the tight feedback loop it has between real-time application and the analytics that naturally follow. Traditionally, users would be forced to move data between systems via complex ETL processes, or perform both functions on the same system with the risk of one impacting the other. In big data environments, this process can be time-consuming and burdensome. With DataStax Enterprise, both real-time and analytic big data operations take place in the same distributed system, but users have the ability to dedicate certain nodes solely for analytics so that workload doesn’t slow down real-time processing. Users simply define one or more replica groups, and configure the role of each—one or more Cassandra, Hadoop or HDFS (i.e., HDFS without job/task tracker) nodes. Writes are instantly replicated between both real-time and analytic nodes. © 2011 DataStax. All rights reserved. Page: 15
  • 16. With DataStax Enterprise, users truly have the best of both worlds for big data management. They have all the power of Cassandra serving their highest-volume and high-velocity real-time applications, and the power of Hadoop, Hive, and Pig working directly against the same data for analytics. The result is smart workload management for big data application, which is much simpler to manage and more reliable than any of the alternatives. Visual Database Management DataStax Enterprise includes a visual, browser-based management solution named OpsCenter Enterprise. OpsCenter Enterprise allows a developer or administrator to manage and monitor the health of big data clusters from a centralized web console. Figure 10: OpsCenter Enterprise database cluster ring view OpsCenter Enterprise uses an agent-based architecture to monitor and carry out tasks on each node in a DataStax Enterprise cluster. Through a graphical and intuitive point-and-click interface, a user can understand the state of a cluster, which nodes are up and down, and what type of performance users are experiencing. Key events are reported into a centralized dashboard displayed along with other vital statistics. © 2011 DataStax. All rights reserved. Page: 16
  • 17. Figure 11: OpsCenter dashboard Analytic operations also can be monitored and controlled from within OpsCenter Enterprise: Figure 12: OpsCenter analytic operations monitoring Enterprise Production Support and Services Big data situations often require fast access to skilled expertise. DataStax Enterprise includes experienced production support and consultative services from Cassandra experts. You can © 2011 DataStax. All rights reserved. Page: 17
  • 18. choose the right production support package for your business needs, including rapid response SLAs and consultative help. DataStax also provides certified quarterly service pack updates for DataStax Enterprise, as well as other benefits such as emergency hot fixes (for production outages) and bug escalation privileges. Additionally, DataStax offers professional big data training on Cassandra and Hadoop, with classes offered in many major cities as well as on-site for corporations that need many staff members trained at once. Conclusion Big data isn’t just hype—and it’s much more than a buzz phrase. Today, companies across industries are finding they not only need to manage increasingly large data volumes in their real- time systems, but also analyze that information so they can make the right decisions, fast, to compete effectively in the market. Modern businesses looking for a solution to handle their big data easily and effectively should consider DataStax Enterprise. Its scale-out architecture comfortably scales into the petabytes data range, while offering high performance for reads and writes no matter the data volume size. DataStax Enterprise also differs from competitors by providing a single integrated database platform that smartly manages both real-time and analytic data. Additionally, DataStax Enterprise does all of this at a fraction of the cost charged by traditional RDBMS vendors. To find out more about DataStax Enterprise and obtain trial software, please visit www.datastax.com or email info@datastax.com. © 2011 DataStax. All rights reserved. Page: 18
  • 19. About DataStax DataStax is the developer of DataStax Enterprise, a distributed, scalable, and highly available database platform that delivers optimal performance either on premise or in the cloud for modern enterprise applications that manage both real-time and analytic workloads. The company has over 100 customers, including leaders such as Netflix, Cisco, Rackspace and Constant Contact, and spanning verticals including web, financial services, telecommunications, logistics and government. DataStax is backed by industry leading investors, including Lightspeed Venture Partners and Crosslink Capital and is based in Burlingame, CA with offices in Austin, TX and Stamford, CT. For more information, visit www.datastax.com. © 2011 DataStax. All rights reserved. Page: 19