SlideShare a Scribd company logo
1 of 8
Download to read offline
But there is an emerging realization across public and private
sectors that there must be more to “Big Data” than just data and
platform. CIOs must transform these Big Data platforms and the
data they house from cost-centers to data-monetization engines.
Forrester likens this very transformation to “refining oil”, and
Pivotal believes data science is at the heart of the new oil rush.
Big Data emphasizes volume, diversification, low latencies, and
ubiquity, whereas data science introduces new terms including,
predictive modeling, machine learning, parallelized and in-
database algorithms, Map Reduce, and
model operationalization. Don’t worry
— I am not going to get bogged down
here by the debate on the definition
of a data scientist.1
Instead, I want to emphasize a more
important point regarding this new
vernacular: It infers an evolution
beyond the traditional rigid output of
aggregated data: business intelligence.
It is a use-case-driven, iterative, and
agile exploration of granular data,
with the intent to derive insights and
operationalize these insights into
down-stream applications.
Examples of data science in action are
plentiful and also well documented,
both in a recent Harvard Business Review article, and these
Pivotal presentations:
Click on following titles to access full information
•	 Harvard Business Review article
•	 Pivotal presentations
Considering the growing number of use cases relevant to your
enterprise, what has changed is that these are no longer confined
to legacy data-driven functions such as marketing or finance.
White paper
Transforming Your Company into a
Data Science-Driven Enterprise
goPivotal.com
Introduction
Big Data is the latest technology wave impacting C-Level executives across all areas of business, but amid
the hype, there remains confusion about what it all means. The name emphasizes the exponential growth
of data volumes worldwide (collectively, 2.5 Exabytes/ day in the latest estimate I saw from IDC), but more
nuanced definitions of Big Data incorporate the following key tenets: diversification, low latency, and ubiquity.
In the current developmental-phase of Big Data, CIOs are investing in platforms to “manage” Big Data.
by Annika Jimenez
1 	
This is a well-blogged topic. We have our definition. It more or less aligns with other earlier
definitions, and emphasizes the combination of programming skills and statistical knowledge.
I’ll save this debate for a different post.
Disruptive Data Science Series
White paper Disruptive Data Science
2
Instead, they can and should involve nearly every functional
organization in the enterprise. Given the numerous opportunities
data science offers for virtually every functional organization
in every sector, it is now no longer enough to be a “data-driven
enterprise.” Instead, you must build a data science-driven
enterprise, a.k.a. the predictive enterprise.
As I build out Pivotal’s Data Science Services team, I’m in a very
privileged position to witness, first-hand, the organizational
transformations underway across nearly all sectors, private and
public. Based on these observations as well as my own personal
experience at Yahoo!, I believe that any enterprise initiative
to move towards more pervasive utilization of data science
is disruptive on multiple levels. Any CIO, CTO, or even CEO
considering this strategic shift must place the transformation
front and center at the C-level table, or risk significant erosion
of comparative advantage to the first movers who are getting
it right.
Why a data science-powered
transformation is disruptive
With the very initiation of a data science-powered transformation,
the endeavor and whoever is driving it are acknowledging that the
status quo for analytics utilization does not deliver against the
believed potential value for the given business (however defined).
As a result, any individual (wherever they are on the totem pole),
technology, and even organization overly associated with legacy
or the status quo will find themselves exposed to some degree of
uncertainty and possibly even
vulnerability. What transpires
when an enterprise kicks off
such an initiative ranges within
two extremes. On one side
— the “bad outcome” — the
effort yields a hot mess of
organizational wrangling over
concepts like “data ownership”
and “where analytics should
live”, shortsighted technology
investments or the digging-
in-of-heels around legacy
platforms, and analytical project
work-to-nowhere — all with
the enterprise’s competitive
advantage wavering on
a precipice.
On the other end of the spectrum — the “good outcome” —
the effort can yield a complete rebirth or transformation of
the company built upon data-derived innovation, resulting in
data science-generated intellectual property and competitive
advantage. On this end of the spectrum, I often see an executive
alignment on prioritization for data initiatives, a thriving data
science culture, a drumbeat focus on smart data instrumentation
and data quality processes, and modeling efforts with clearly
defined paths to operationalization for top- or bottom-line
impacting actions.
I am frequently asked by our Analytics Labs customers which
levers they should control to drive towards the “good outcome”
as they embrace data science. The levers are numerous, and each
is integral to the success of the effort. I’ve mentally cultivated my
list over an extended period of time, based on my team’s data
science work with our customers and prospects, my observations
of the travails and successes of various Pivotal customers, and my
pre-Pivotal days at Yahoo!, where I ran central Insights Services
and led globalized data solutions during the company’s data
“glory days”.2
Click on following title to access full information
•	 Analytics Labs customers
Transformation catalysts
In this post, I will provide some high-level color on these levers,
or “transformation catalysts” as I call them. In subsequent posts I
will cycle back to those warranting a deeper dive.
Vision
The shift to a data science-driven enterprise will not necessarily
fail without clear established vision, but the presence of a
thoughtfully crafted and shared vision can dramatically accelerate
2
Referencing the time when Yahoo! prioritized data as its core strategic play, hired one of the first “Chief Data Officers”, Usama Fayyad, and built a soup-to-nuts
central data organization, “Strategic Data Solutions” (2004–2008). The organization was disbanded in 2008 during Yahoo!’s executive revolving door, and the
resulting “SDS diaspora” has driven data-savvy executives to Intuit, Salesforce.com, Facebook, Google, and innumerable data start-ups. Many of us are also at Pivotal.
White paper Disruptive Data Science
3
the path to value generation. By “vision” I’m specifically
referencing the visualization of a path from current-state of
data utilization to a clear understanding of the promise of data
science (in a Big Data context,) when applied against key business
problems or goals the company is facing. It implies, at least at
the outset, a conviction in the financial impact (top- or bottom-
line) that data science initiatives can bring, and it usually entails
a passionate communication and sharing of these ideas with key
influencers (decision-makers and owners of the data science
value chain).
With some buy-in established, the vision should lead to strategy
definition via the programmatic gathering of ideas across
stakeholder communities, and aligning technology and human
capital investments with the platform capabilities required
to support these. Central to “getting vision” is that it must
be elevated beyond any grassroots origination. Many times, a
company’s data-savvy thinkers who have vision (or at least a
passionately held belief in the transformative nature of data
science when applied to key business needs) are too buried in the
organization. They aren’t sufficiently empowered with a mandate
to translate this vision to concrete, programmatic strategy. This,
then, brings me to the next topic:
Organization, Organizational
Alignment, and Executive Sponsorship
My colleague, Josh Klahr, VP of Products, wrote a nice blog
recently about building a data organization, leveraging the
lessons learned during our mutual time at Yahoo! As it relates
to fully harnessing the promise of data science, my observations
on organization extend beyond the functional components he
details, to consider fundamental organizational structure.
Within any one company, I’ve found there is a constant
pendulum swinging between the two organizational structures:
decentralized, where data scientists and other analytics
professionals sit organizationally with the line of business
(“LOB”), and centralized, where data scientists organizationally
sit with a central data organization. A given single company
can frequently swing between the decentralized model and
the centralized model. This pendulum pivots along arguments
advocating LOB-level domain expertise and LOB-control of
project prioritization, versus considerations of the central need
to pool and retain scarce talent, encourage cross-LOB knowledge
sharing, and establish standards and efficiencies. I am well known
to be a “centralist” and have a strong belief that any data science
capabilities should sit in the same organization owning the data
platform. That organization is usually but not always, IT.
There are numerous reasons why I support this model. At a very
pragmatic level, there is an essential collaboration that must
exist between the data scientist and all the other owners of the
“value chain” of operationalized predictive and machine-learning
models.
These owners mostly, but not entirely, sit organizationally
together, and the successful productionalization of models is
dependent upon strong collaboration across the value-chain
points. As I mentioned in my talk at Strata, I believe the data
scientist’s span of influence is expanding into all these critical
points along the chain. To adequately yield the necessary degree
of influence to align roadmaps, priorities, and resource allocation,
the Data Scientist(s) must sit organizationally in the same world.
Click on following title to access full information
•	 Strata
White paper Disruptive Data Science
4
Once a company’s data capabilities have evolved to the point
organizationally that they have an emerging centralized data
team, this typically brings with it more senior-level oversight
and sponsorship, which is exactly what will be needed to push
key data science initiatives from conceptualization to execution.
Remember, the point here is to evolve from current-state,
typically highly operationalized data processes supporting BI
and other mission critical applications, to initiating proofs-of-
concepts (“POCs”) around predictive modeling and machine
learning. These POCs will start introducing pressure on the value-
chain’s existing commitments and resourcing obligations, just
like it does on the data itself. Driving commitments, adherence
to the team’s own committed timelines, and keeping the interest
and enthusiasm of the executive table all the while, will require
the sponsorship and oversight of a peer at said table. Without
the right person at the top, willing to go to bat for the collective
team and constantly socializing progress, and without the right
relationships with key enablers in the wider enterprise, the entire
transition itself can fall apart.
Of course there is even more color to the topic of Organization
than I have touched upon here. I’ll go into more depth in a later
post.
Data Availability
The whole dialog begins with data: its assumed availability in
some form to down-stream consumers, and in more granular,
“raw” forms, to data scientists. Of course, realizing this core
requirement is no trivial task. It carries the assumption that
valuable data assets have been identified (or newly created),
captured, stored in a common store or warehouse, and made
available with the right adherence to company privacy and
security policies.
Our team typically starts a conversation with a prospect that has:
•	 	Siloed data assets, with valuable data stored in many locations
across the prospect’s IT footprint
•	 	Immature data policies which either permit improper use of
data, or instead overly restrict ease of data access
•	 A strong belief that there is immense value in applying more
sophisticated analytics to raw data.
Getting the uncleansed, scattered assets pulled together and into
one place, with usage rights adherent to proper privacy/security
policies, are often the first set of tactical hurdles in enabling data
science.
In some cases, despite a valuable pre-existing base of data, the
actual data being targeted for inclusion in a key project doesn’t
yet exist. This is where a strong understanding of, appreciation
for, and process-definition around data instrumentation become
critical. The non-existence of data shouldn’t stop efforts in their
tracks. Instead, with the right process and the right assessment
of value, this lack of data can kick-start an effort to instrument
the target capture-vehicle (web logs, IT logs, sensor, transactional
systems, etc.) for tracking. With a clear vision translated to
strategy and then to specific projects, an understanding of the
required data will emerge. For any one company to succeed in
transforming itself into a predictive enterprise, coordination
between the owners of the data capture — frequently Product
Management — and the analytical leadership must be baked into
the organization’s planning culture.
Whether captured data is pushed to Hadoop, massively parallel
processing (MPP) databases, or both, the term “availability”
also alludes to an access layer that doesn’t overly restrict self-
provisioning. SQL, MapReduce, Pig, Hive, etc are all tools enabling
access. However, the hybridization of distributed compute
paradigms introduces a requirement to reduce the complexity at
the access layer so any one data scientist or analyst doesn’t have
to master so many languages.
Pivotal solves for this issue a variety of ways, most prominently
via Chorus, our open source collaboration platform for data
science which enables data discovery and self-provisioning. And
because we recognize data availability is one of the fundamental
enablers of the predictive enterprise, we have additional
technology innovations brewing on this front. Stay tuned for
upcoming announcements on this.
Choosing the Right Distributed
Computing Platform
This is Pivotal’s raison d’être. Since there are many people
blogging about our products and their advantages/disadvantages
in supporting Big Data initiatives, I won’t go into deep detail on
our Hadoop-MPP hybrid stack here. I will simply comment that
we see the variety and volume of our customers’ data drive their
evaluations, and then, either a tepid or passionate embrace of
Hadoop. Despite our roots in the massively parallel processing
(MPP) PostgreSQL relational database, we firmly believe that
Hadoop will be a cornerstone technology for the predictive
enterprise.
However, for the iterative, exploratory work typical of model-
building, my team continues to push more advanced analytical
functions into the Pivotal MPP Database. In most cases, we find
this to be the most performant solution across the very large
data volumes we are working on, and given the complexity of the
functions that are being applied to the data. Our Unified Analytics
Platform (UAP) supports this agile utilization of both compute
architectures. Along with Chorus and MADLib, UAP delivers a
White paper Disruptive Data Science
5
complete paradigm shift for your data science practitioners
that will transform their effectiveness in the organization.
The integrated stack dramatically shortens time-to-insight,
drives efficiencies in data science, and allows the building and
maintenance of much more sophisticated models that rely on
increasingly broad sources and structures of data. These benefits
are what make “believers” within my data science team, and we
are unabashed advocates of the Pivotal UAP.
Click on following title to access full information
•	 MADLib
All of this said, and beyond the Pivotal UAP pitch, a more nuanced
take-away from my list is that the intelligent selection of the right
distributed computing platform is not the only critical catalyst to
transformation. It is clearly a key enabler given the paradigm shift
it yields, but it does not ipso facto yield a predictive enterprise,
given the need to focus on all the other catalysts I discuss here.
Next-Generation Toolkits for
Advanced Analytics
In the world of predictive analytics, my team distinguishes
between “Big Data” toolkits and “Small Data” toolkits. The key
distinction is whether the analytical functions for descriptive
statistics, classification, regression, clustering, and other common
machine learning techniques, are written to be applied against
very large volumes of data while leveraging some form of
parallelism or distribution of compute power. With “Small Data”
toolkits, the application of the function or algorithm often still
happens at the desktop, against heavily sampled data, which
implies limited computational power and performance, and lots
of data movement (typically touching at least one other person
beyond the data scientist). Tools with origins in this category
include: R, Stata, Matlab/Octave, SAS, SPSS, etc.
Many of the purveyors of these toolkits are actively developing
“Big Data” versions, which would better leverage the
computational horsepower of distributed computing platforms.
Big Data analytical toolkits, then, might also include R (i.e.,
Revolution R, Radoop, and even R when rendered through a
Procedural Language,) and SAS (High-Performance Analytics),
in addition to those which were born with a focus on parallel
algorithms: Mahout (open source algorithm library for Hadoop),
MADLib (open source algorithm library for MPP and PosgreSQL),
and GUI-based Big Data predictive modeling packages, such as
Alpine Data Labs.
Click on following titles to access full information
•	 MADLib
•	 R when rendered through a Procedural Language
•	 Mahout
•	 Alpine Data Labs
These Big Data toolkits are cornerstone-enablers of the
predictive enterprise, delivering on the promised paradigm
shift I mentioned in the previous section. Investing in a Big Data
platform that continues to rely on “Small Data” modeling toolkits
is not delivering the power of Big Data innovations to those
“monetization architects” (aka Data Scientists) most able to
create value.
My team actively works in an agile fashion across many Big Data
toolkits. We may, within a single project, work with PL/R, MADLib,
and Alpine — choosing algorithms and functions depending on
how they were written, their performance within Pivotal’s UAP,
and the quality of the output. Or, we may, if the right algorithm
isn’t currently available, author a new version for MADlib. I
believe this sub-toolkit agility is the future of advanced analytics,
where data scientists will assess the same algorithm from many
toolkits and select the most suitable one depending on the data,
the use-case, and the underlying platform and performance
requirements. Tools like Pivotal’s Chorus and open-source
algorithm libraries and the efforts of our partners such as Kaggle,
will enable this algorithm-level evaluation and selection. All of this
will put additional pressure on the analytical walled gardens that
we have in the market today.
In this scenario, where data scientists are evaluating functions
and algorithms below the toolkit level, the HR implications are
clear. Any organization working to become a predictive enterprise
will need to cultivate or aggressively recruit the right skills to
enable this sophisticated tool selection (not to mention drive the
design and development of the actual model itself). This then,
leads me to:
Data Science Skills & Training Existing
Team Members
Beyond the core question of what levers control the transition
to a predictive enterprise, the next most frequent questions I get
from customers and prospects are “What is a data scientist?” and
“How do I build a data science team?” I’ll reserve the answers to
these questions for a separate blog post, but I will say now that I
am one of those people who think slick UIs and visualization tools
will never “toolify” away the need for the data scientist’s skills.
The blood and guts of predictive modeling must remain firmly in
the hands of well-prepared data science practitioners.
As an initial source of talent, you will want to devise a continuing
education plan for your pre-existing employees who might
be ready to deepen their analytical skills. Pivotal was first-to-
market to respond to this need with a data science certification
White paper Disruptive Data Science
6
curriculum. However, to rapidly build-out a team of solid data
scientists (the number depending on the projects envisioned
and knowledge required,) and to establish the right analytical
leadership, you will likely need to recruit new skills into your
organization. Your ability to lure strong data science talent in this
white-hot market for data scientists depends entirely on your
ability to build the most attractive analytics playground for your
candidates.
Click on following titles to access full information
•	 Data science certification curriculum
•	 White-hot market for data scientists
The best way to do this is to not just offer strong market-
based compensation packages, but also offer the best possible
environment for the top talent to practice their craft. This
includes:
•	 Cultivating the richest data in your sector
•	 Maintaining an executive spotlight on their work and a
constant drum-beat of executive sponsorship for data science
initiatives
•	 	Providing state-of-the-art data platforms and analytical tools
•	 Creating opportunities to share what they are doing with
broader audiences
•	 Forging strong paths for model operationalization to deliver
the real monetary impact
If you pull these things together, as WalMart has done, and its
competitors Target and Sears are now also doing in the retail
space, you will become a contender in the recruiting battles
against well-known competitors such as Google and Facebook.
Lastly on this point: your data scientists will become increasingly
valuable with every project. Your job, Mr. or Ms. Creator of the
Predictive Enterprise, will increasingly be less about recruiting
strong talent (once you’ve achieved the target team size that
embodies the right analytical knowledge), and increasingly more
about retaining your data science talent.
Defined paths to operationalization
At the end of the day, any strategic shift to predictive analytics
will need to prove significant impact to business-specific
performance indicators, whatever these are – e.g., revenues,
costs, new customers, transaction values, customer engagement,
hospital readmissions, intruders detected, etc. The pressure to
deliver “wins” to the business will begin almost immediately once
a concerted effort gets underway.
I have repeatedly found that all sorts of analytics resources
can be thrown at a business question only to have the results
bubbled up to a few key insights that languish without any
meaningful action-enablement. When I see this happening, it’s
often a yellow flag that the company hasn’t moved beyond its
earlier BI-centricity, and is still defining the goal of any one effort
as “informing decision-making”. Informing decision-making is
good, in the sense that it hopes to arm executives with the right
data and insights to help them maybe make the right decisions,
but it clearly doesn’t scale. Anyone on the hook to prove his/
her value in decision-support knows that “informing decision-
making” is not measurable. On the other hand, having a clearly
defined and resourced path to operationalize predictive models
into production environments for down-stream application
integration ensures both automated “actioning” against events of
interest, and the ability to measure
the impact on the target KPI.
In our engagements with customers,
the “path to operationalization”,
which I consider the “last mile of
data science,” includes the following
elements:
•	 Production, likely low-latency,
data scoring against the resulting
model,
•	 Storage and delivery of
scored output to downstream
applications,
•	 Definition of response actions
and development of these within
the application environment,
or the development of a new
application to enable automated
actioning,
White paper Disruptive Data Science
7
•	 	Development of reporting layer to track impact, and
•	 	(More rarely) creation of the closed-loop, channeling action-
response data back into the model to power self-learning
capabilities, or adaptive models.
This work can be easy or hard depending on your pre-existing
investments in enterprise systems and in-house development
capabilities. The key to a successful transformation to the
predictive enterprise is that all of these elements and their
respective requirements are actively understood, levels-of-effort
are determined, and resourcing is established. Without definition
of the path to operationalization and committed resourcing for
execution, there will be no value delivered back to the business.
Process & Change Management
The requirements around defining the path to operationalization
for any one model immediately make me think about process
definition and change management. If there is one silent killer
of data science initiatives, it’s the lack of defined process and
program management to drive project conceptualization to
committed resourcing and then to execution. Because the
value-chain around data science is so extended across so many
disparate owners (aka “influencers of success”), the lack of
defined process will make coordination and commitments, and
ultimately the delivery of “wins” exceedingly difficult.
As I wrote earlier, it is common to lack the right data to support
a defined data science project. In many cases the data is there for
the pickin’ (likely in some upstream product), but the product
owners haven’t instrumented the product for the proper data
capture. The seemingly basic process of:
•	 	getting in front of these product managers to drive
commitments to define and then deploy the right form of
instrumentation, and then
•	 initiating data generation, capturing, and storing of the new
data for use in a predictive modeling exercise, can be extended
and timely depending on the data requirements, product-
engineering’s existing roadmaps and resource commitments,
and the agility of the data platform to absorb new data.
Imagine having to repeat this basic process repeatedly across
many product owners, re-defining a new process each time,
and you can see how delay-prone key data initiatives can
become simply due to poor process definition.
Beyond instrumentation, other critical planning junctions
frequently driven by data science and involving multi-team
coordination are:
•	 Analytics roadmapping
•	 SLA changes to drive lower latencies on data delivery
•	 ETL changes
•	 Data quality initiatives
•	 Data policy changes
•	 	Model operationalization
Without a defined process for expediting coordination along the
key influencers of the data science value chain, there are many
opportunities for projects to break down.
Given the importance of this area, the creator of the Data
Science Team must also think about new roles beyond the
data scientist. These include positions like project managers
and engagement managers to coordinate planning, gather
requirements, socialize plans, drive execution against committed
timelines, and communicate progress to internal stakeholders.
These are all critical tasks that should not be owned by the data
scientist.
(As a side note, and related to the earlier topic on Organization,
Organizational Alignment, and Executive Sponsorship, this is also
very frequently why many organizationally complex businesses
move to new C-level roles like “Chief Data Officer” or “Chief
Analytics Officer”. An executive champion, with the right
relationships with his/her peers, will not-so-surprisingly “grease
the wheels” to drive momentum, coordination, prioritization, and
resourcing commitments)
Wrapping Up
So there you have them; my “transformation catalysts” to yield a
predictive enterprise. When taken in full, you should appreciate
why I call this transformation “disruptive”. I’m sure many of you
White paper Disruptive Data Science
GoPivotal, Pivotal, and the Pivotal logo are registered trademarks or trademarks of GoPivotal, Inc. in the United States and other countries. All other trademarks used herein are the property of their respective owners.
© Copyright 2013 Go Pivotal, Inc. All rights reserved. Published in the USA. PVTL-WP-211-09/13
Pivotal, committed to open source and open standards, is a leading provider of application and data infrastructure software, agile development services, and data science consulting.
Pivotal’s revolutionary Enterprise PaaS product, powered by Cloud Foundry, will be available in Q4 2013.
Pivotal CF is a complete, next generation Enterprise Platform-as-a-Service that makes it possible, for the first time, for the employees of the enterprise to rapidly create modern
applications. To create powerful experiences that serve their customers in the context of who they are, where they are, and what they are doing in the moment. To store, manage and
deliver value from fast, massive data sets. To build, deploy and scale at an unprecedented pace.
Uniting selected technology, people and programs from EMC and VMware, the following products and services are now part of Pivotal: Greenplum®
, Cloud Foundry, Spring, Cetas,
Pivotal Labs®
, GemFire®
and other products from the VMware vFabricTM
Suite.
Pivotal 1900 S Norfolk Street San Mateo CA 94403 goPivotal.com
will have strong opinions on what’s missing or overly emphasized,
and I welcome any debate or constructive input. After all, given
the newness of the transformations underway at this moment,
this is an ongoing learning process for us all. Over the course of
the next few months, I will review a few of these catalysts in a bit
more depth than what I have room for here today. Those topics
warranting more discussion include Vision, Organization, and
Data Science Skills.
And, lastly, you will start hearing more directly from my data
science team on the innovative work we are doing across sectors,
the tools we are building and using, performance benchmarks
of the varying analytical toolkits, and more. Stay tuned here for
more disruptive data science from Pivotal!
About Annika Jimenez
Annika is a seasoned leader of analytics initiatives, coming to
Pivotal after over six years in data leadership roles at Yahoo!
At Pivotal since April 2011, she has built the “Data Science
Dream Team” – an industry-leading group of Data Scientists –
representing a rich combination of vertical domain and horizontal
analytical expertise – who are facilitating Data Science-driven
transformations for Pivotal customers. During her time at
Yahoo!, she led Audience and International data solutions for
Yahoo!’s central data organization, Strategic Data Solutions, and
led Insights Services – comprised of a team of 40 researchers
covering Web analytics, satisfaction/brand health metrics, and
audience/ad measurement. Annika is a recognized evangelist for
“applied data” and well known for her acute focus on action-
enablement.
About Pivotal
Pivotal is building a new platform for a new era, setting the
standard for Enterprise Platform- as-a-Service (PaaS). The
company’s mission is to enable customers to build a new class of
applications, leveraging big and fast data, doing all of this with the
power of cloud independence.
Uniting selected technology, people and programs from EMC
and VMware, the following products and services are now part of
Pivotal: Greenplum®
, Cloud Foundry, Spring, Cetas, Pivotal Labs®
,
GemFire®
and other products from the VMware vFabric™
Suite.
Powered by new data fabrics, Pivotal One is a complete, next
generation Enterprise Platform-as-a-Service that makes it
possible, for the first time, for the employees of the enterprise to
rapidly create consumer-grade applications. To create powerful
experiences that serve a consumer in the context of who they
are, where they are, and what they are doing in the moment. To
store, manage and deliver value from fast, massive data sets. To
build, deploy and scale at an unprecedented pace.
Pivotal One integrates these sophisticated new data fabrics with
modern programming frameworks and cloud independence in a
single, united platform.
Learn More
To learn more about our products, services and solutions, visit us
at goPivotal.com.

More Related Content

What's hot

Reaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsReaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsThe Marketing Distillery
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data ManagementFORMCEPT
 
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Happiest Minds Technologies
 
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Mindshappiestmindstech
 
"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015yann le gigan
 
Colab 2019 Making Sense of the Data That Matters
Colab 2019 Making Sense of the Data That MattersColab 2019 Making Sense of the Data That Matters
Colab 2019 Making Sense of the Data That MattersIan Gibbs
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052Gilbert Rozario
 
From Big Data to Business Value
From Big Data to Business ValueFrom Big Data to Business Value
From Big Data to Business ValueGib Bassett
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseBRIDGEi2i Analytics Solutions
 
Pov Big Data And Bmi V F
Pov   Big Data And Bmi V FPov   Big Data And Bmi V F
Pov Big Data And Bmi V FAbigail Howe
 
Business Analytics Insights
Business Analytics InsightsBusiness Analytics Insights
Business Analytics InsightsChrisHoogendoorn
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatCharlie Hecht
 
Ruthless Execution - Oracle's Journey to Customer Centricity
Ruthless Execution - Oracle's Journey to Customer CentricityRuthless Execution - Oracle's Journey to Customer Centricity
Ruthless Execution - Oracle's Journey to Customer CentricityMainstay
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paperJohn Enoch
 
Big-Data-The-Case-for-Customer-Experience
Big-Data-The-Case-for-Customer-ExperienceBig-Data-The-Case-for-Customer-Experience
Big-Data-The-Case-for-Customer-ExperienceAndrew Smith
 
Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIBM India Smarter Computing
 

What's hot (20)

Reaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analyticsReaping the benefits of Big Data and real time analytics
Reaping the benefits of Big Data and real time analytics
 
Augmented Data Management
Augmented Data ManagementAugmented Data Management
Augmented Data Management
 
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
 
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 
Report: CIOs & Big Data
Report: CIOs & Big DataReport: CIOs & Big Data
Report: CIOs & Big Data
 
Big Data strategy components
Big Data strategy componentsBig Data strategy components
Big Data strategy components
 
"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015"Big data in western europe today" Forrester / Xerox 2015
"Big data in western europe today" Forrester / Xerox 2015
 
Mighty Guides- Data Disruption
Mighty Guides- Data DisruptionMighty Guides- Data Disruption
Mighty Guides- Data Disruption
 
Colab 2019 Making Sense of the Data That Matters
Colab 2019 Making Sense of the Data That MattersColab 2019 Making Sense of the Data That Matters
Colab 2019 Making Sense of the Data That Matters
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
From Big Data to Business Value
From Big Data to Business ValueFrom Big Data to Business Value
From Big Data to Business Value
 
Whitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in EnterpriseWhitepaper - Simplifying Analytics Adoption in Enterprise
Whitepaper - Simplifying Analytics Adoption in Enterprise
 
Pov Big Data And Bmi V F
Pov   Big Data And Bmi V FPov   Big Data And Bmi V F
Pov Big Data And Bmi V F
 
Business Analytics Insights
Business Analytics InsightsBusiness Analytics Insights
Business Analytics Insights
 
Winning with a data-driven strategy
Winning with a data-driven strategyWinning with a data-driven strategy
Winning with a data-driven strategy
 
Applied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_YhatApplied_Data_Science_Presented_by_Yhat
Applied_Data_Science_Presented_by_Yhat
 
Ruthless Execution - Oracle's Journey to Customer Centricity
Ruthless Execution - Oracle's Journey to Customer CentricityRuthless Execution - Oracle's Journey to Customer Centricity
Ruthless Execution - Oracle's Journey to Customer Centricity
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big-Data-The-Case-for-Customer-Experience
Big-Data-The-Case-for-Customer-ExperienceBig-Data-The-Case-for-Customer-Experience
Big-Data-The-Case-for-Customer-Experience
 
Integrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your BusinessIntegrating Analytics into the Operational Fabric of Your Business
Integrating Analytics into the Operational Fabric of Your Business
 

Viewers also liked

vCloud Air Network Has Arrived
vCloud Air Network Has ArrivedvCloud Air Network Has Arrived
vCloud Air Network Has ArrivedEMC
 
Sonderheft big data ebook_englisch
Sonderheft big data ebook_englischSonderheft big data ebook_englisch
Sonderheft big data ebook_englischEMC
 
Storage networking fcf_co_eiscsivsn_technology
Storage networking fcf_co_eiscsivsn_technologyStorage networking fcf_co_eiscsivsn_technology
Storage networking fcf_co_eiscsivsn_technologyEMC
 
Block renaissanceart
Block renaissanceartBlock renaissanceart
Block renaissanceartTravis Klein
 
Up in china_fortune_media
Up in china_fortune_mediaUp in china_fortune_media
Up in china_fortune_mediaShari Monnes
 
Informe criterio identificacion cliente
Informe criterio identificacion clienteInforme criterio identificacion cliente
Informe criterio identificacion clienteNathalia Sanchez
 
V mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperV mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperEMC
 
The Selling To The Point System
The Selling To The Point SystemThe Selling To The Point System
The Selling To The Point SystemJeffrey Lipsius
 
06 trade and value
06 trade and value06 trade and value
06 trade and valueTravis Klein
 
Mon timeline of important events
Mon timeline of important eventsMon timeline of important events
Mon timeline of important eventsTravis Klein
 
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...EMC
 
Metabolic syndrome and dementia
Metabolic syndrome and dementiaMetabolic syndrome and dementia
Metabolic syndrome and dementiaRavi Soni
 
04 diminishing marginal returns
04 diminishing marginal returns04 diminishing marginal returns
04 diminishing marginal returnsTravis Klein
 
The Digital Universe in 2020 - United States
The Digital Universe in 2020 - United StatesThe Digital Universe in 2020 - United States
The Digital Universe in 2020 - United StatesEMC
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKBEBI
 

Viewers also liked (20)

Office 365 busting the myths
Office 365 busting the mythsOffice 365 busting the myths
Office 365 busting the myths
 
vCloud Air Network Has Arrived
vCloud Air Network Has ArrivedvCloud Air Network Has Arrived
vCloud Air Network Has Arrived
 
Day 3
Day 3Day 3
Day 3
 
Sonderheft big data ebook_englisch
Sonderheft big data ebook_englischSonderheft big data ebook_englisch
Sonderheft big data ebook_englisch
 
Storage networking fcf_co_eiscsivsn_technology
Storage networking fcf_co_eiscsivsn_technologyStorage networking fcf_co_eiscsivsn_technology
Storage networking fcf_co_eiscsivsn_technology
 
Block renaissanceart
Block renaissanceartBlock renaissanceart
Block renaissanceart
 
Up in china_fortune_media
Up in china_fortune_mediaUp in china_fortune_media
Up in china_fortune_media
 
Informe criterio identificacion cliente
Informe criterio identificacion clienteInforme criterio identificacion cliente
Informe criterio identificacion cliente
 
V mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paperV mware sddc-micro-segmentation-white-paper
V mware sddc-micro-segmentation-white-paper
 
Doc2
Doc2Doc2
Doc2
 
The Selling To The Point System
The Selling To The Point SystemThe Selling To The Point System
The Selling To The Point System
 
06 trade and value
06 trade and value06 trade and value
06 trade and value
 
Mon timeline of important events
Mon timeline of important eventsMon timeline of important events
Mon timeline of important events
 
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...
TechBook: Mainframe EMC Symmetrix Remote Data Facility (SRDF) Four-Site Migra...
 
Metabolic syndrome and dementia
Metabolic syndrome and dementiaMetabolic syndrome and dementia
Metabolic syndrome and dementia
 
04 diminishing marginal returns
04 diminishing marginal returns04 diminishing marginal returns
04 diminishing marginal returns
 
The Digital Universe in 2020 - United States
The Digital Universe in 2020 - United StatesThe Digital Universe in 2020 - United States
The Digital Universe in 2020 - United States
 
Mobile mini trends
Mobile mini trendsMobile mini trends
Mobile mini trends
 
Finland
FinlandFinland
Finland
 
Automatic Annotation in UniProtKB
Automatic Annotation in UniProtKBAutomatic Annotation in UniProtKB
Automatic Annotation in UniProtKB
 

Similar to Transforming Enterprises with Data Science Insights

How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management Abhishek Sood
 
Cost & benefits of business analytics marshall sponder
Cost & benefits of business analytics marshall sponderCost & benefits of business analytics marshall sponder
Cost & benefits of business analytics marshall sponderMarshall Sponder
 
Driving Value Through Data Analytics: The Path from Raw Data to Informational...
Driving Value Through Data Analytics: The Path from Raw Data to Informational...Driving Value Through Data Analytics: The Path from Raw Data to Informational...
Driving Value Through Data Analytics: The Path from Raw Data to Informational...Cognizant
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Tommy Toy
 
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...Cognizant
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Lora Cecere
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016Anjan Roy, PMP
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analyticsThe Marketing Distillery
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...Data Science Council of America
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Jennifer Walker
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhereBill Peer
 
iabsg_dataroundtable
iabsg_dataroundtableiabsg_dataroundtable
iabsg_dataroundtablePeter Hubert
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Mark Hewitt
 
If the CIO is to be valued as a strategic actor, how can he bring .docx
If the CIO is to be valued as a strategic actor, how can he bring .docxIf the CIO is to be valued as a strategic actor, how can he bring .docx
If the CIO is to be valued as a strategic actor, how can he bring .docxsheronlewthwaite
 

Similar to Transforming Enterprises with Data Science Insights (20)

How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management How 3 trends are shaping analytics and data management
How 3 trends are shaping analytics and data management
 
Big Data : a 360° Overview
Big Data : a 360° Overview Big Data : a 360° Overview
Big Data : a 360° Overview
 
Cost & benefits of business analytics marshall sponder
Cost & benefits of business analytics marshall sponderCost & benefits of business analytics marshall sponder
Cost & benefits of business analytics marshall sponder
 
Making sense of BI
Making sense of BIMaking sense of BI
Making sense of BI
 
Driving Value Through Data Analytics: The Path from Raw Data to Informational...
Driving Value Through Data Analytics: The Path from Raw Data to Informational...Driving Value Through Data Analytics: The Path from Raw Data to Informational...
Driving Value Through Data Analytics: The Path from Raw Data to Informational...
 
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
Data Science And Analytics Outsourcing – Vendors, Models, Steps by Ravi Kalak...
 
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...
Traveling the Big Data Super Highway: Realizing Enterprise-wide Adoption and ...
 
Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013Big Data Handbook - 8 Juy 2013
Big Data Handbook - 8 Juy 2013
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
 
Getting down to business on Big Data analytics
Getting down to business on Big Data analyticsGetting down to business on Big Data analytics
Getting down to business on Big Data analytics
 
Big data web
Big data webBig data web
Big data web
 
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
How To Transform Your Analytics Maturity Model Levels, Technologies, and Appl...
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
 
Bidata
BidataBidata
Bidata
 
enterprise-data-everywhere
enterprise-data-everywhereenterprise-data-everywhere
enterprise-data-everywhere
 
iabsg_dataroundtable
iabsg_dataroundtableiabsg_dataroundtable
iabsg_dataroundtable
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
If the CIO is to be valued as a strategic actor, how can he bring .docx
If the CIO is to be valued as a strategic actor, how can he bring .docxIf the CIO is to be valued as a strategic actor, how can he bring .docx
If the CIO is to be valued as a strategic actor, how can he bring .docx
 

More from EMC

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDEMC
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote EMC
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOEMC
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremioEMC
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lakeEMC
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereEMC
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History EMC
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewEMC
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeEMC
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic EMC
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityEMC
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeEMC
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015EMC
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesEMC
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsEMC
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookEMC
 

More from EMC (20)

INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUDINDUSTRY-LEADING  TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
INDUSTRY-LEADING TECHNOLOGY FOR LONG TERM RETENTION OF BACKUPS IN THE CLOUD
 
Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote Cloud Foundry Summit Berlin Keynote
Cloud Foundry Summit Berlin Keynote
 
EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX EMC GLOBAL DATA PROTECTION INDEX
EMC GLOBAL DATA PROTECTION INDEX
 
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIOTransforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
Transforming Desktop Virtualization with Citrix XenDesktop and EMC XtremIO
 
Citrix ready-webinar-xtremio
Citrix ready-webinar-xtremioCitrix ready-webinar-xtremio
Citrix ready-webinar-xtremio
 
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
EMC FORUM RESEARCH GLOBAL RESULTS - 10,451 RESPONSES ACROSS 33 COUNTRIES
 
EMC with Mirantis Openstack
EMC with Mirantis OpenstackEMC with Mirantis Openstack
EMC with Mirantis Openstack
 
Modern infrastructure for business data lake
Modern infrastructure for business data lakeModern infrastructure for business data lake
Modern infrastructure for business data lake
 
Force Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop ElsewhereForce Cyber Criminals to Shop Elsewhere
Force Cyber Criminals to Shop Elsewhere
 
Pivotal : Moments in Container History
Pivotal : Moments in Container History Pivotal : Moments in Container History
Pivotal : Moments in Container History
 
Data Lake Protection - A Technical Review
Data Lake Protection - A Technical ReviewData Lake Protection - A Technical Review
Data Lake Protection - A Technical Review
 
Mobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or FoeMobile E-commerce: Friend or Foe
Mobile E-commerce: Friend or Foe
 
Virtualization Myths Infographic
Virtualization Myths Infographic Virtualization Myths Infographic
Virtualization Myths Infographic
 
Intelligence-Driven GRC for Security
Intelligence-Driven GRC for SecurityIntelligence-Driven GRC for Security
Intelligence-Driven GRC for Security
 
The Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure AgeThe Trust Paradox: Access Management and Trust in an Insecure Age
The Trust Paradox: Access Management and Trust in an Insecure Age
 
EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015EMC Technology Day - SRM University 2015
EMC Technology Day - SRM University 2015
 
EMC Academic Summit 2015
EMC Academic Summit 2015EMC Academic Summit 2015
EMC Academic Summit 2015
 
Data Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education ServicesData Science and Big Data Analytics Book from EMC Education Services
Data Science and Big Data Analytics Book from EMC Education Services
 
Using EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere EnvironmentsUsing EMC Symmetrix Storage in VMware vSphere Environments
Using EMC Symmetrix Storage in VMware vSphere Environments
 
Using EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBookUsing EMC VNX storage with VMware vSphereTechBook
Using EMC VNX storage with VMware vSphereTechBook
 

Recently uploaded

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Recently uploaded (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Transforming Enterprises with Data Science Insights

  • 1. But there is an emerging realization across public and private sectors that there must be more to “Big Data” than just data and platform. CIOs must transform these Big Data platforms and the data they house from cost-centers to data-monetization engines. Forrester likens this very transformation to “refining oil”, and Pivotal believes data science is at the heart of the new oil rush. Big Data emphasizes volume, diversification, low latencies, and ubiquity, whereas data science introduces new terms including, predictive modeling, machine learning, parallelized and in- database algorithms, Map Reduce, and model operationalization. Don’t worry — I am not going to get bogged down here by the debate on the definition of a data scientist.1 Instead, I want to emphasize a more important point regarding this new vernacular: It infers an evolution beyond the traditional rigid output of aggregated data: business intelligence. It is a use-case-driven, iterative, and agile exploration of granular data, with the intent to derive insights and operationalize these insights into down-stream applications. Examples of data science in action are plentiful and also well documented, both in a recent Harvard Business Review article, and these Pivotal presentations: Click on following titles to access full information • Harvard Business Review article • Pivotal presentations Considering the growing number of use cases relevant to your enterprise, what has changed is that these are no longer confined to legacy data-driven functions such as marketing or finance. White paper Transforming Your Company into a Data Science-Driven Enterprise goPivotal.com Introduction Big Data is the latest technology wave impacting C-Level executives across all areas of business, but amid the hype, there remains confusion about what it all means. The name emphasizes the exponential growth of data volumes worldwide (collectively, 2.5 Exabytes/ day in the latest estimate I saw from IDC), but more nuanced definitions of Big Data incorporate the following key tenets: diversification, low latency, and ubiquity. In the current developmental-phase of Big Data, CIOs are investing in platforms to “manage” Big Data. by Annika Jimenez 1 This is a well-blogged topic. We have our definition. It more or less aligns with other earlier definitions, and emphasizes the combination of programming skills and statistical knowledge. I’ll save this debate for a different post. Disruptive Data Science Series
  • 2. White paper Disruptive Data Science 2 Instead, they can and should involve nearly every functional organization in the enterprise. Given the numerous opportunities data science offers for virtually every functional organization in every sector, it is now no longer enough to be a “data-driven enterprise.” Instead, you must build a data science-driven enterprise, a.k.a. the predictive enterprise. As I build out Pivotal’s Data Science Services team, I’m in a very privileged position to witness, first-hand, the organizational transformations underway across nearly all sectors, private and public. Based on these observations as well as my own personal experience at Yahoo!, I believe that any enterprise initiative to move towards more pervasive utilization of data science is disruptive on multiple levels. Any CIO, CTO, or even CEO considering this strategic shift must place the transformation front and center at the C-level table, or risk significant erosion of comparative advantage to the first movers who are getting it right. Why a data science-powered transformation is disruptive With the very initiation of a data science-powered transformation, the endeavor and whoever is driving it are acknowledging that the status quo for analytics utilization does not deliver against the believed potential value for the given business (however defined). As a result, any individual (wherever they are on the totem pole), technology, and even organization overly associated with legacy or the status quo will find themselves exposed to some degree of uncertainty and possibly even vulnerability. What transpires when an enterprise kicks off such an initiative ranges within two extremes. On one side — the “bad outcome” — the effort yields a hot mess of organizational wrangling over concepts like “data ownership” and “where analytics should live”, shortsighted technology investments or the digging- in-of-heels around legacy platforms, and analytical project work-to-nowhere — all with the enterprise’s competitive advantage wavering on a precipice. On the other end of the spectrum — the “good outcome” — the effort can yield a complete rebirth or transformation of the company built upon data-derived innovation, resulting in data science-generated intellectual property and competitive advantage. On this end of the spectrum, I often see an executive alignment on prioritization for data initiatives, a thriving data science culture, a drumbeat focus on smart data instrumentation and data quality processes, and modeling efforts with clearly defined paths to operationalization for top- or bottom-line impacting actions. I am frequently asked by our Analytics Labs customers which levers they should control to drive towards the “good outcome” as they embrace data science. The levers are numerous, and each is integral to the success of the effort. I’ve mentally cultivated my list over an extended period of time, based on my team’s data science work with our customers and prospects, my observations of the travails and successes of various Pivotal customers, and my pre-Pivotal days at Yahoo!, where I ran central Insights Services and led globalized data solutions during the company’s data “glory days”.2 Click on following title to access full information • Analytics Labs customers Transformation catalysts In this post, I will provide some high-level color on these levers, or “transformation catalysts” as I call them. In subsequent posts I will cycle back to those warranting a deeper dive. Vision The shift to a data science-driven enterprise will not necessarily fail without clear established vision, but the presence of a thoughtfully crafted and shared vision can dramatically accelerate 2 Referencing the time when Yahoo! prioritized data as its core strategic play, hired one of the first “Chief Data Officers”, Usama Fayyad, and built a soup-to-nuts central data organization, “Strategic Data Solutions” (2004–2008). The organization was disbanded in 2008 during Yahoo!’s executive revolving door, and the resulting “SDS diaspora” has driven data-savvy executives to Intuit, Salesforce.com, Facebook, Google, and innumerable data start-ups. Many of us are also at Pivotal.
  • 3. White paper Disruptive Data Science 3 the path to value generation. By “vision” I’m specifically referencing the visualization of a path from current-state of data utilization to a clear understanding of the promise of data science (in a Big Data context,) when applied against key business problems or goals the company is facing. It implies, at least at the outset, a conviction in the financial impact (top- or bottom- line) that data science initiatives can bring, and it usually entails a passionate communication and sharing of these ideas with key influencers (decision-makers and owners of the data science value chain). With some buy-in established, the vision should lead to strategy definition via the programmatic gathering of ideas across stakeholder communities, and aligning technology and human capital investments with the platform capabilities required to support these. Central to “getting vision” is that it must be elevated beyond any grassroots origination. Many times, a company’s data-savvy thinkers who have vision (or at least a passionately held belief in the transformative nature of data science when applied to key business needs) are too buried in the organization. They aren’t sufficiently empowered with a mandate to translate this vision to concrete, programmatic strategy. This, then, brings me to the next topic: Organization, Organizational Alignment, and Executive Sponsorship My colleague, Josh Klahr, VP of Products, wrote a nice blog recently about building a data organization, leveraging the lessons learned during our mutual time at Yahoo! As it relates to fully harnessing the promise of data science, my observations on organization extend beyond the functional components he details, to consider fundamental organizational structure. Within any one company, I’ve found there is a constant pendulum swinging between the two organizational structures: decentralized, where data scientists and other analytics professionals sit organizationally with the line of business (“LOB”), and centralized, where data scientists organizationally sit with a central data organization. A given single company can frequently swing between the decentralized model and the centralized model. This pendulum pivots along arguments advocating LOB-level domain expertise and LOB-control of project prioritization, versus considerations of the central need to pool and retain scarce talent, encourage cross-LOB knowledge sharing, and establish standards and efficiencies. I am well known to be a “centralist” and have a strong belief that any data science capabilities should sit in the same organization owning the data platform. That organization is usually but not always, IT. There are numerous reasons why I support this model. At a very pragmatic level, there is an essential collaboration that must exist between the data scientist and all the other owners of the “value chain” of operationalized predictive and machine-learning models. These owners mostly, but not entirely, sit organizationally together, and the successful productionalization of models is dependent upon strong collaboration across the value-chain points. As I mentioned in my talk at Strata, I believe the data scientist’s span of influence is expanding into all these critical points along the chain. To adequately yield the necessary degree of influence to align roadmaps, priorities, and resource allocation, the Data Scientist(s) must sit organizationally in the same world. Click on following title to access full information • Strata
  • 4. White paper Disruptive Data Science 4 Once a company’s data capabilities have evolved to the point organizationally that they have an emerging centralized data team, this typically brings with it more senior-level oversight and sponsorship, which is exactly what will be needed to push key data science initiatives from conceptualization to execution. Remember, the point here is to evolve from current-state, typically highly operationalized data processes supporting BI and other mission critical applications, to initiating proofs-of- concepts (“POCs”) around predictive modeling and machine learning. These POCs will start introducing pressure on the value- chain’s existing commitments and resourcing obligations, just like it does on the data itself. Driving commitments, adherence to the team’s own committed timelines, and keeping the interest and enthusiasm of the executive table all the while, will require the sponsorship and oversight of a peer at said table. Without the right person at the top, willing to go to bat for the collective team and constantly socializing progress, and without the right relationships with key enablers in the wider enterprise, the entire transition itself can fall apart. Of course there is even more color to the topic of Organization than I have touched upon here. I’ll go into more depth in a later post. Data Availability The whole dialog begins with data: its assumed availability in some form to down-stream consumers, and in more granular, “raw” forms, to data scientists. Of course, realizing this core requirement is no trivial task. It carries the assumption that valuable data assets have been identified (or newly created), captured, stored in a common store or warehouse, and made available with the right adherence to company privacy and security policies. Our team typically starts a conversation with a prospect that has: • Siloed data assets, with valuable data stored in many locations across the prospect’s IT footprint • Immature data policies which either permit improper use of data, or instead overly restrict ease of data access • A strong belief that there is immense value in applying more sophisticated analytics to raw data. Getting the uncleansed, scattered assets pulled together and into one place, with usage rights adherent to proper privacy/security policies, are often the first set of tactical hurdles in enabling data science. In some cases, despite a valuable pre-existing base of data, the actual data being targeted for inclusion in a key project doesn’t yet exist. This is where a strong understanding of, appreciation for, and process-definition around data instrumentation become critical. The non-existence of data shouldn’t stop efforts in their tracks. Instead, with the right process and the right assessment of value, this lack of data can kick-start an effort to instrument the target capture-vehicle (web logs, IT logs, sensor, transactional systems, etc.) for tracking. With a clear vision translated to strategy and then to specific projects, an understanding of the required data will emerge. For any one company to succeed in transforming itself into a predictive enterprise, coordination between the owners of the data capture — frequently Product Management — and the analytical leadership must be baked into the organization’s planning culture. Whether captured data is pushed to Hadoop, massively parallel processing (MPP) databases, or both, the term “availability” also alludes to an access layer that doesn’t overly restrict self- provisioning. SQL, MapReduce, Pig, Hive, etc are all tools enabling access. However, the hybridization of distributed compute paradigms introduces a requirement to reduce the complexity at the access layer so any one data scientist or analyst doesn’t have to master so many languages. Pivotal solves for this issue a variety of ways, most prominently via Chorus, our open source collaboration platform for data science which enables data discovery and self-provisioning. And because we recognize data availability is one of the fundamental enablers of the predictive enterprise, we have additional technology innovations brewing on this front. Stay tuned for upcoming announcements on this. Choosing the Right Distributed Computing Platform This is Pivotal’s raison d’être. Since there are many people blogging about our products and their advantages/disadvantages in supporting Big Data initiatives, I won’t go into deep detail on our Hadoop-MPP hybrid stack here. I will simply comment that we see the variety and volume of our customers’ data drive their evaluations, and then, either a tepid or passionate embrace of Hadoop. Despite our roots in the massively parallel processing (MPP) PostgreSQL relational database, we firmly believe that Hadoop will be a cornerstone technology for the predictive enterprise. However, for the iterative, exploratory work typical of model- building, my team continues to push more advanced analytical functions into the Pivotal MPP Database. In most cases, we find this to be the most performant solution across the very large data volumes we are working on, and given the complexity of the functions that are being applied to the data. Our Unified Analytics Platform (UAP) supports this agile utilization of both compute architectures. Along with Chorus and MADLib, UAP delivers a
  • 5. White paper Disruptive Data Science 5 complete paradigm shift for your data science practitioners that will transform their effectiveness in the organization. The integrated stack dramatically shortens time-to-insight, drives efficiencies in data science, and allows the building and maintenance of much more sophisticated models that rely on increasingly broad sources and structures of data. These benefits are what make “believers” within my data science team, and we are unabashed advocates of the Pivotal UAP. Click on following title to access full information • MADLib All of this said, and beyond the Pivotal UAP pitch, a more nuanced take-away from my list is that the intelligent selection of the right distributed computing platform is not the only critical catalyst to transformation. It is clearly a key enabler given the paradigm shift it yields, but it does not ipso facto yield a predictive enterprise, given the need to focus on all the other catalysts I discuss here. Next-Generation Toolkits for Advanced Analytics In the world of predictive analytics, my team distinguishes between “Big Data” toolkits and “Small Data” toolkits. The key distinction is whether the analytical functions for descriptive statistics, classification, regression, clustering, and other common machine learning techniques, are written to be applied against very large volumes of data while leveraging some form of parallelism or distribution of compute power. With “Small Data” toolkits, the application of the function or algorithm often still happens at the desktop, against heavily sampled data, which implies limited computational power and performance, and lots of data movement (typically touching at least one other person beyond the data scientist). Tools with origins in this category include: R, Stata, Matlab/Octave, SAS, SPSS, etc. Many of the purveyors of these toolkits are actively developing “Big Data” versions, which would better leverage the computational horsepower of distributed computing platforms. Big Data analytical toolkits, then, might also include R (i.e., Revolution R, Radoop, and even R when rendered through a Procedural Language,) and SAS (High-Performance Analytics), in addition to those which were born with a focus on parallel algorithms: Mahout (open source algorithm library for Hadoop), MADLib (open source algorithm library for MPP and PosgreSQL), and GUI-based Big Data predictive modeling packages, such as Alpine Data Labs. Click on following titles to access full information • MADLib • R when rendered through a Procedural Language • Mahout • Alpine Data Labs These Big Data toolkits are cornerstone-enablers of the predictive enterprise, delivering on the promised paradigm shift I mentioned in the previous section. Investing in a Big Data platform that continues to rely on “Small Data” modeling toolkits is not delivering the power of Big Data innovations to those “monetization architects” (aka Data Scientists) most able to create value. My team actively works in an agile fashion across many Big Data toolkits. We may, within a single project, work with PL/R, MADLib, and Alpine — choosing algorithms and functions depending on how they were written, their performance within Pivotal’s UAP, and the quality of the output. Or, we may, if the right algorithm isn’t currently available, author a new version for MADlib. I believe this sub-toolkit agility is the future of advanced analytics, where data scientists will assess the same algorithm from many toolkits and select the most suitable one depending on the data, the use-case, and the underlying platform and performance requirements. Tools like Pivotal’s Chorus and open-source algorithm libraries and the efforts of our partners such as Kaggle, will enable this algorithm-level evaluation and selection. All of this will put additional pressure on the analytical walled gardens that we have in the market today. In this scenario, where data scientists are evaluating functions and algorithms below the toolkit level, the HR implications are clear. Any organization working to become a predictive enterprise will need to cultivate or aggressively recruit the right skills to enable this sophisticated tool selection (not to mention drive the design and development of the actual model itself). This then, leads me to: Data Science Skills & Training Existing Team Members Beyond the core question of what levers control the transition to a predictive enterprise, the next most frequent questions I get from customers and prospects are “What is a data scientist?” and “How do I build a data science team?” I’ll reserve the answers to these questions for a separate blog post, but I will say now that I am one of those people who think slick UIs and visualization tools will never “toolify” away the need for the data scientist’s skills. The blood and guts of predictive modeling must remain firmly in the hands of well-prepared data science practitioners. As an initial source of talent, you will want to devise a continuing education plan for your pre-existing employees who might be ready to deepen their analytical skills. Pivotal was first-to- market to respond to this need with a data science certification
  • 6. White paper Disruptive Data Science 6 curriculum. However, to rapidly build-out a team of solid data scientists (the number depending on the projects envisioned and knowledge required,) and to establish the right analytical leadership, you will likely need to recruit new skills into your organization. Your ability to lure strong data science talent in this white-hot market for data scientists depends entirely on your ability to build the most attractive analytics playground for your candidates. Click on following titles to access full information • Data science certification curriculum • White-hot market for data scientists The best way to do this is to not just offer strong market- based compensation packages, but also offer the best possible environment for the top talent to practice their craft. This includes: • Cultivating the richest data in your sector • Maintaining an executive spotlight on their work and a constant drum-beat of executive sponsorship for data science initiatives • Providing state-of-the-art data platforms and analytical tools • Creating opportunities to share what they are doing with broader audiences • Forging strong paths for model operationalization to deliver the real monetary impact If you pull these things together, as WalMart has done, and its competitors Target and Sears are now also doing in the retail space, you will become a contender in the recruiting battles against well-known competitors such as Google and Facebook. Lastly on this point: your data scientists will become increasingly valuable with every project. Your job, Mr. or Ms. Creator of the Predictive Enterprise, will increasingly be less about recruiting strong talent (once you’ve achieved the target team size that embodies the right analytical knowledge), and increasingly more about retaining your data science talent. Defined paths to operationalization At the end of the day, any strategic shift to predictive analytics will need to prove significant impact to business-specific performance indicators, whatever these are – e.g., revenues, costs, new customers, transaction values, customer engagement, hospital readmissions, intruders detected, etc. The pressure to deliver “wins” to the business will begin almost immediately once a concerted effort gets underway. I have repeatedly found that all sorts of analytics resources can be thrown at a business question only to have the results bubbled up to a few key insights that languish without any meaningful action-enablement. When I see this happening, it’s often a yellow flag that the company hasn’t moved beyond its earlier BI-centricity, and is still defining the goal of any one effort as “informing decision-making”. Informing decision-making is good, in the sense that it hopes to arm executives with the right data and insights to help them maybe make the right decisions, but it clearly doesn’t scale. Anyone on the hook to prove his/ her value in decision-support knows that “informing decision- making” is not measurable. On the other hand, having a clearly defined and resourced path to operationalize predictive models into production environments for down-stream application integration ensures both automated “actioning” against events of interest, and the ability to measure the impact on the target KPI. In our engagements with customers, the “path to operationalization”, which I consider the “last mile of data science,” includes the following elements: • Production, likely low-latency, data scoring against the resulting model, • Storage and delivery of scored output to downstream applications, • Definition of response actions and development of these within the application environment, or the development of a new application to enable automated actioning,
  • 7. White paper Disruptive Data Science 7 • Development of reporting layer to track impact, and • (More rarely) creation of the closed-loop, channeling action- response data back into the model to power self-learning capabilities, or adaptive models. This work can be easy or hard depending on your pre-existing investments in enterprise systems and in-house development capabilities. The key to a successful transformation to the predictive enterprise is that all of these elements and their respective requirements are actively understood, levels-of-effort are determined, and resourcing is established. Without definition of the path to operationalization and committed resourcing for execution, there will be no value delivered back to the business. Process & Change Management The requirements around defining the path to operationalization for any one model immediately make me think about process definition and change management. If there is one silent killer of data science initiatives, it’s the lack of defined process and program management to drive project conceptualization to committed resourcing and then to execution. Because the value-chain around data science is so extended across so many disparate owners (aka “influencers of success”), the lack of defined process will make coordination and commitments, and ultimately the delivery of “wins” exceedingly difficult. As I wrote earlier, it is common to lack the right data to support a defined data science project. In many cases the data is there for the pickin’ (likely in some upstream product), but the product owners haven’t instrumented the product for the proper data capture. The seemingly basic process of: • getting in front of these product managers to drive commitments to define and then deploy the right form of instrumentation, and then • initiating data generation, capturing, and storing of the new data for use in a predictive modeling exercise, can be extended and timely depending on the data requirements, product- engineering’s existing roadmaps and resource commitments, and the agility of the data platform to absorb new data. Imagine having to repeat this basic process repeatedly across many product owners, re-defining a new process each time, and you can see how delay-prone key data initiatives can become simply due to poor process definition. Beyond instrumentation, other critical planning junctions frequently driven by data science and involving multi-team coordination are: • Analytics roadmapping • SLA changes to drive lower latencies on data delivery • ETL changes • Data quality initiatives • Data policy changes • Model operationalization Without a defined process for expediting coordination along the key influencers of the data science value chain, there are many opportunities for projects to break down. Given the importance of this area, the creator of the Data Science Team must also think about new roles beyond the data scientist. These include positions like project managers and engagement managers to coordinate planning, gather requirements, socialize plans, drive execution against committed timelines, and communicate progress to internal stakeholders. These are all critical tasks that should not be owned by the data scientist. (As a side note, and related to the earlier topic on Organization, Organizational Alignment, and Executive Sponsorship, this is also very frequently why many organizationally complex businesses move to new C-level roles like “Chief Data Officer” or “Chief Analytics Officer”. An executive champion, with the right relationships with his/her peers, will not-so-surprisingly “grease the wheels” to drive momentum, coordination, prioritization, and resourcing commitments) Wrapping Up So there you have them; my “transformation catalysts” to yield a predictive enterprise. When taken in full, you should appreciate why I call this transformation “disruptive”. I’m sure many of you
  • 8. White paper Disruptive Data Science GoPivotal, Pivotal, and the Pivotal logo are registered trademarks or trademarks of GoPivotal, Inc. in the United States and other countries. All other trademarks used herein are the property of their respective owners. © Copyright 2013 Go Pivotal, Inc. All rights reserved. Published in the USA. PVTL-WP-211-09/13 Pivotal, committed to open source and open standards, is a leading provider of application and data infrastructure software, agile development services, and data science consulting. Pivotal’s revolutionary Enterprise PaaS product, powered by Cloud Foundry, will be available in Q4 2013. Pivotal CF is a complete, next generation Enterprise Platform-as-a-Service that makes it possible, for the first time, for the employees of the enterprise to rapidly create modern applications. To create powerful experiences that serve their customers in the context of who they are, where they are, and what they are doing in the moment. To store, manage and deliver value from fast, massive data sets. To build, deploy and scale at an unprecedented pace. Uniting selected technology, people and programs from EMC and VMware, the following products and services are now part of Pivotal: Greenplum® , Cloud Foundry, Spring, Cetas, Pivotal Labs® , GemFire® and other products from the VMware vFabricTM Suite. Pivotal 1900 S Norfolk Street San Mateo CA 94403 goPivotal.com will have strong opinions on what’s missing or overly emphasized, and I welcome any debate or constructive input. After all, given the newness of the transformations underway at this moment, this is an ongoing learning process for us all. Over the course of the next few months, I will review a few of these catalysts in a bit more depth than what I have room for here today. Those topics warranting more discussion include Vision, Organization, and Data Science Skills. And, lastly, you will start hearing more directly from my data science team on the innovative work we are doing across sectors, the tools we are building and using, performance benchmarks of the varying analytical toolkits, and more. Stay tuned here for more disruptive data science from Pivotal! About Annika Jimenez Annika is a seasoned leader of analytics initiatives, coming to Pivotal after over six years in data leadership roles at Yahoo! At Pivotal since April 2011, she has built the “Data Science Dream Team” – an industry-leading group of Data Scientists – representing a rich combination of vertical domain and horizontal analytical expertise – who are facilitating Data Science-driven transformations for Pivotal customers. During her time at Yahoo!, she led Audience and International data solutions for Yahoo!’s central data organization, Strategic Data Solutions, and led Insights Services – comprised of a team of 40 researchers covering Web analytics, satisfaction/brand health metrics, and audience/ad measurement. Annika is a recognized evangelist for “applied data” and well known for her acute focus on action- enablement. About Pivotal Pivotal is building a new platform for a new era, setting the standard for Enterprise Platform- as-a-Service (PaaS). The company’s mission is to enable customers to build a new class of applications, leveraging big and fast data, doing all of this with the power of cloud independence. Uniting selected technology, people and programs from EMC and VMware, the following products and services are now part of Pivotal: Greenplum® , Cloud Foundry, Spring, Cetas, Pivotal Labs® , GemFire® and other products from the VMware vFabric™ Suite. Powered by new data fabrics, Pivotal One is a complete, next generation Enterprise Platform-as-a-Service that makes it possible, for the first time, for the employees of the enterprise to rapidly create consumer-grade applications. To create powerful experiences that serve a consumer in the context of who they are, where they are, and what they are doing in the moment. To store, manage and deliver value from fast, massive data sets. To build, deploy and scale at an unprecedented pace. Pivotal One integrates these sophisticated new data fabrics with modern programming frameworks and cloud independence in a single, united platform. Learn More To learn more about our products, services and solutions, visit us at goPivotal.com.