We are increasingly coming upon an age where technology is a strong enabler of business success, where there are strong synergies between business strategy and technology strategy. You often cannot discuss business strategy without data and related technologies being a big part of it. And as such, business leaders are increasingly turning to IT to compete more effectively in the market. As IT management, it falls upon you to ensure that your data technology architecture (software & hardware) is built in a way that it can handle the business demands of today and in to the future. In this session, we will discuss the various big data technology architectures and associated tools, and what role each should play in your data environment. We will also give real life examples of how others are using these technologies. Build a better data architecture, to unlock the power of all data.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Are You Prepared For The Future Of Data Technologies?
1. MT56: Are you prepared for the
future of data technologies?
Shawn Rogers, Chief Research Officer, Dell Big Data
Anthony Dina, Director of Big Data Enterprise Technologists
2. Organizations actively using data grow 50%
faster than laggards.
50%
39% 42%
( 2 0 14 ) ( 2 0 1 5 )
The number of
organizations who
understand the
benefits of big data
grew slightly.
Source: 2015 Dell Global Technology Adoption Index
4. Are you prepared?
New technologies
Decentralized EDW and
open source frameworks
make data management
more cost-effective.
Maturing users
Organizations are moving
to self-service data access
and from reporting to
predictive analytics.
New challengers
Hbase, Cassandra,
Couchbase, MongoDB are
all mission critical for
today’s enterprise.
IoT
50% of organizations
see IoT as an important or
essential part of business.
Database sprawl
25% of organizations
are managing more than
500 databases.
Big data adoption
44%of organizations
don’t know how to
approach big data.
5. Only 16% of projects are championed by IT.
Finance
17%
Marketing
16%
C-Suite
16%
Source: 2015 Dell Global Technology Adoption Index
IT
16%
LoB
16%
Sales
16%
6. The unstoppable trends that drive big data
User
Maturing user community
• Workloads and demands
• Reporting to advanced analytics
• Democratization
• Self-service
• Demand agile/fast deployment
• and more
Economic impact
• Relatively inexpensive
enterprise hardware
• Open source frameworks
• Recession resistant
• and more
Data
Technology
Economics
Disruptive technology innovation
• New technologies
• Enhanced power and performance
• Greater scale
• New platforms
• Decentralized EDW
• and more
New and valuable data
• New alternatives
• Sophisticated analysis
• Big Data
• Internet of Things (IoT)
• Sensor, machine and social
• and more
7. Data management, strategy issues and stakeholder
issues are the top barriers to implementation
Column19.5%
22%
21.5%
21.1%
12.7%
12.5%
.6%
Source: Enterprise Management Associates (EMA) Insights Across the Hybrid Enterprise Big Data, 2015
Poor data management
Strategy issues
Stakeholder issues
Lack of skills to manage multi-structured
data platforms
Shortage of application management
features
Other
Complexity of multiple data platforms
8. The rise of the Hybrid Data Environment
Authored by Shawn Rogers and John Myers at Enterprise Management Associates (EMA)
9. Multiple platforms power innovation.
Source: Enterprise Management Associates (EMA)
Operationalizing the Buzz: Big Data 2013
One, 32.10%
Two, 28.20%
Three, 27.80%
Four, 4.30%
Five, 3.50%
Eight, 2.30%
12. Hadoop
Hadoop turned 10 in 2015! Developed at Google and Yahoo!
Overview
• It is a distributed file
system (HDFS) with
computational
capabilities.
• Uses key-value pairs.
• Example: Omneo uses
Hadoop to analyze its
supply chain data.
Cons
• Minimum latency for
jobs can be >1 min.
• Doesn’t scale well for
streaming/real-time
data.
• Using MapReduce to
manipulate data can be
very complex.
Pros
• Best used for very large
data sets to be
manipulated in batches
• Mature eco-system of
tools
• Hive and Pig add SQL
like interfaces to access
Hadoop data.
13. Spark
Spark was developed in 2009 at UC Berkeley
but released only last year.
Overview
• Designed as an
alternative to
MapReduce
• Cluster computing
based; runs in-memory
or disk (10x-100x faster).
• Example: Prevent fraud
by monitoring credit
card transactions in real
time.
Cons
• Relatively new; uses still
being defined
• SQL tools need some
more work.
• Ecosystem of tools is
still being developed.
Pros
• Best for streaming real
time data; can be used
for batch as well.
• Runs standalone, on
Hadoop or in cloud.
• Access data from HDFS,
Cassandra or S3.
• Offers SQL-like tools.
14. NoSQL
NoSQL has been around for a long time, but became popular when
web 1.0 companies came on the scene.
Overview
• It’s a category that
describes various types
of databases: key-value,
graph, document etc.
• Popular NoSQL stores:
MongoDB, Redis, Neo4j
• Example: A large media
house uses MongoDB to
store social sharing data.
Cons
• Tool ecosystem is
relatively immature.
• Limited native
capabilities for ad-hoc
query and analysis
• Talent availability can be
a challenge sometimes.
Pros
• Allows the database to
scale out (distributed).
• Handles high transaction
volumes.
• No need for proprietary
and expensive purpose-
built infrastructure.
15. SAP HANA
SAP HANA was first introduced five years ago to tackle
real time database needs
Overview
• In-memory computing
platform that integrates
different acquisitions
made by SAP.
• Purpose-built for real
time analytics and
applications.
• Example: Grupo Estrella
can make business
decisions faster.
Cons
• You will get most out of
it if you use SAP
Business Suite.
• Works best for very
large enterprises.
• You get locked into the
SAP ecosystem.
Pros
• Industry-leading real-
time capabilities
• Strong integration with
SAP Business Suite (BS)
• Offered both on
premises and in the
cloud
16. Business need
Surgeons at the University of Iowa Hospitals and Clinics
needed to know if patients were susceptible to infections
in order to make critical treatment decisions in the
operating room. Reducing the infection rate has major
implications for overall patient health and cost savings.
Solution and results
• Predicting infection likelihood by merging historical and
real time patient data
• 58% reduction in occurrences of surgical site infections
• Personalized healthcare
• Reduced cost of patient care
Data required for analysis
• Demographic data
• Preoperative data like Apgar, blood loss, wound class
• Real time data- operating room, # procedures,
category, duration, open vs min invasive
Reducing surgical site
infections
17. Business need
Merkle is a marketing agency that analyses customer data
on behalf of their clients. They needed a scalable, cost
effective way to capture and analyze large amounts of
structured and unstructured consumer data for use in
developing better marketing campaigns for clients.
Solution and results
• Use Hadoop to organize data to be analysed.
• Continued using different data platforms to collect and
store data.
• 7-10X faster processing performance
• More data means better results – using 3-4X more data.
Data required for analysis
• Demographic data
• Online behavior like websites visited
• Offline behavior like TV, direct mail
• Purchase data
• Social media data
Creating better customer
experiences
18. Role-play
Scenario: The largest grocer in
Central Texas is feeling the heat
from an internet retail company
that plans to offer same day
grocery in three key markets. The
first of these is Austin, which has a
tech-savvy population.
Follow the dialog between the GM
and CIO of the grocer, as they aim
to move quickly to counteract this
competitive threat.
20. 3 things to do next
Get inspired. Talk to other Dell customers while here,
learn what they’re doing, and translate that into relevant
use cases for your organization.
Get going. Schedule time with a Dell big data enterprise
technologist to assess your unique environment and
discuss data project fundamentals.
Get prepared. Partner with the business to address the
challenges in the “wheel” one by one or holistically to
build a future-ready data architecture.