Strategies for Enterprise Grade Azure-based Analytics

IT & DATA MANAGEMENT RESEARCH, INDUSTRY
ANALYSIS & CONSULTING
Looking Before You Leap Into the Cloud:
Taking a proactive approach to machine learning, analytics, and data
engineering in the cloud
John L. Myers
Managing Research Director
EMA
Nik Rouda
Director of Product Marketing
Cloudera

Featured Speakers
Slide 2 © 2018 Enterprise Management Associates, Inc.
John Myers, Managing Research Director, EMA
John has nearly 20 years of experience in areas related to business analytics and
business intelligence in professional services, sales consulting, product
management, industry analysis, and research. He helps organizations solve their
analytics problems, whether they related to operational platforms like customer
care, billing, or applied analytical applications, such as revenue assurance or
fraud management. John established thought leadership in emerging data
management paradigms such as big data (combination of multi-structured and
relational data sets) applications and NoSQL access data stores.
Nik Rouda, Director of Product Marketing, Cloudera
Nik is a director of product marketing at Cloudera, covering cloud solutions and core
platforms. He has deep enterprise IT infrastructure experience in storage,
networking, security, and big data and analytics. He’s worked worldwide in a variety
of customer-facing roles at innovative companies such as Riverbed, NetApp,
Veritas, and the smart home startup AlertMe.com (acquired by British Gas.) Most
recently he was an industry analyst at Enterprise Strategy Group (ESG.)

Logistics for Today’s Webinar
An archived version of the event recording will be
available at www.enterprisemanagement.com
• Log questions in the chat panel located on the lower
left-hand corner of your screen
• Questions will be addressed during the Q&A session
of the event
QUESTIONS
EVENT RECORDING
A PDF of the speaker slides will be distributed
to all attendees
PDF SLIDES

Join the Conversation
To submit questions or comments, use:
@JohnLMyers44 @cloudera @nrouda #cloud

Agenda
• Drivers for implementing machine learning, analytics, and data engineering with a
proactive approach
• Pitfalls associated with “immediate gratification” implementations
• How business stakeholders benefit from proactive approaches
• How driven implementations improve the workloads of technologists
• Examples of real-world customer implementations
• Question and Answer

Topic #1:
Drivers for implementing machine learning, analytics,
and data engineering with a proactive approach

Data-Driven Cultures and Strategies

Agility and Speed of Delivery:
Keys to Supporting the Data-Driven Organization

Breaking Out of the Walled Garden:
Moving Beyond Existing Tools

Changing the Face of (Big) Data Analytics and Machine Learning
Implementations
.7%
of end-user survey
respondents have adopted
cloud implementation
strategies

11© Cloudera, Inc. All rights reserved.
+
• Speed of deployment
• Tenant isolation
• Self-service
• Workload elasticity
• Shared storage
• Pay-as-you-go
• Bring your own tools
• Bring your own data
• Powerful network
CLOUD
BENEFITS

Future Hybrid- and Multi- Cloud:
Across Resources to Manage Costs and Operational Risk

Topic #2:
Pitfalls associated with “immediate gratification”
implementations

Siloed Data in Individual Cloud Platforms

Locked Into Vendor Solutions

Increased Data Movement Increases Complexity
#
The top obstacle to cloud
implementation for EMA
end-user survey
respondents was
“increased complexity”

Traditional Applications
17
Data
Exploration
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
INGEST & REPLICATION
DATA CATALOG
SQL & BI
Analytics
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
DATA CATALOG
Operational
Real-Time DB
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
DATA CATALOG
ETL & Data
Processing
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
DATA CATALOG
Custom
Functions
STORAGE
SECURITY
GOVERNANCE
WORKLOAD MGMT
DATA CATALOG
Many data silos, each with its own proprietary tools and infrastructure
Different vendors, products, and services on-premises versus in cloud
A fragmented approach is difficult, expensive, and risky

–
• Proliferation of data copies
• Multiple security frameworks
• Difficult to troubleshoot workloads
• No shared metadata
• Unable to track data lineage
• Disjointed services
• Few on-premises integration services
• Proprietary services
• Cloud lock-in
CLOUD
SETBACKS

Topic #3:
How business stakeholders benefit from proactive
approaches

Self-Service to Speed Deployments

Operations to Exploration to Analytics:
Integrating Between Workloads

More Than Just a Hammer and Nail:
Supporting Multiple Tool(sets) for Data Science and Machine Learning

Building Out Pipelines:
Iterative and Effective Data Engineering
.1%
of end-user survey
respondents indicated that
they can turn data
engineering and data prep
activities within a single
day. Nearly 3 of 10 need a
week or longer!

One platform. Multiple workloads.
DATA ENGINEERING OPERATIONAL
DATABASE
ANALYTIC DATABASE DATA
SCIENCE
DATA PROCESSING
• Cost-efficient
• Reliable
• Scalable
• Based on Spark,
MapReduce, Hive,
and Pig
• Supported by
workload
analytics
FAST BI & SQL
• Flexibility
• Elastic scale
• Go beyond SQL
• Based on
Impala and Hive
• SQL dev enviro
• Supported by
workload
analytics
MACHINE LEARNING
• Fast dev to
production
• Secure self-serve
• Based on
Python, R, and
Spark
• ML dev
environment
(CDSW)
ONLINE & REAL TIME
• High throughput,
low latency
• Strong consistency
• Based on
Hbase, Kudu, and
Spark streaming

Sample Architecture in the Cloud
Object Store
HBase, Search,
Model Server, etc.
Kafka + Spark
streaming on
permanent clusters,
for streaming data
ingest and
processing
Spark batch jobs on
transient clusters,
for processing or
machine learning,
directly read/write to
the object store
Impala for
exploratory BI on
permanent or
transient clusters,
directly read/write to
the object store
Serving tier (e.g.,
HBase, Search) on
permanent clusters,
serving data to end
applications

Cloud Integration to Microsoft Azure
Cloudera
Azure Data Lake

Topic #4:
How proactive implementations improve the workloads
of technologists

Swipe and Go Leads to One-Off Projects

More the Merrier:
Managing Multiple Environments with Multi-tenancy

Harmonized Metadata:
Increased Security and Coordinated Data Access
.1%
of end-user survey respondents
indicated that share metadata
sources were important drivers.
Over 1 of 5 have the removal of
complexity in their strategic
vision.

• Shared catalog
• Unified security
• Consistent governance
• Easy workload management
• Flexible ingest and replication
Open Platform Services
Built for multi-function analytics | Optimized for cloud

Multi-cloud
Platform as a Service

Altus Data Engineering
for ETL, machine learning, and data processing
• Fast, easy job submission without the
cluster management
• Built-in workload snalytics for
troubleshooting and optimization
• Lower costs with transient resources
and pay-per-use pricing
• Full benefits of isolation + shared data
experience

Three immediate use cases for Altus Data Engineering
ETL FOR
ANALYTIC DB
BATCH MACHINE
LEARNING
ETL OFFLOAD
Cloud-native batch
preparation for Impala
on IaaS or, soon,
Altus Analytic DB.
Scalable compute for
massively-parallel batch
machine learning training,
scoring, or simulation.
Offload batch processing
jobs from overburdened
on-premises clusters.
MLData ScienceETL Analytic DB
ETL
On-Prem

Topic #5:
Examples of real-world customer implementations

36© Cloudera, Inc. All rights reserved. 36
The modern platform for machine learning and analytics optimized for the cloud
DATA CATALOG
SECURITY GOVERNANCE
WORKLOAD
MANAGEMENT
INGEST &
REPLICATION
EXTENSIBLE
SERVICES
CORE
SERVICES DATA
ENGINEERING
OPERATIONAL
DATABASE
ANALYTIC
DATABASE
DATA
SCIENCE
S
3
ADL
S
HDF
S
KUD
U
STORAGE
SERVICES
Cloudera Enterprise
PRIVATE CLOUDBARE METAL INFRASTRUCTURE
DEPLOYMENT
OPTIONS SERVICES

DRIVE CUSTOMER INSIGHTS CONNECT PRODUCTS & SERVICES
(IoT)
PROTECT
BUSINESS
Connecting qualified candidates to job vacancies with
reported 30% reduction in time-to-fill
Analyzes equipment data to get a systems
view of machine operation
Detects fraud and complies with federal regulations
and authorities better
Cloudera on Azure powering data-driven customers
DRIVE CUSTOMER INSIGHTS PROTECT
BUSINESS
A WORLDWIDE
FINANCIAL INSTITUTION

Run anywhere. Deploy any way.
Simple Unified Enterprise
• Proven at scale
• Trusted security
• Hybrid or multi-cloud
• Platform as a Service
• Simplifies operations
• Works with your tools

ANALYSIS & CONSULTING© 2018 Enterprise Management Associates, Inc.
• Coordinated data
environment
• Choice of
implementation
strategy
• Synchronization
of assets no
matter the cloud
provider or
implementations
Where to go from here?

Join the Conversation
To submit questions or comments, use:
@JohnLMyers44 @cloudera @nrouda #cloud

Logistics for Today’s Webinar
An archived version of the event recording will be
available at www.enterprisemanagement.com
• Log questions in the chat panel located on the lower
left-hand corner of your screen
• Questions will be addressed during the Q&A session
of the event
QUESTIONS
EVENT RECORDING
A PDF of the speaker slides will be distributed
to all attendees
PDF SLIDES

Question and Answer: Log Questions in the Q&A panel located on the lower
left-hand corner
Learn More About Cloudera at www.cloudera.com
Comme
RG:
Update
the late
greates
JM

Strategies for Enterprise Grade Azure-based Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Strategies for Enterprise Grade Azure-based Analytics

Similar to Strategies for Enterprise Grade Azure-based Analytics (20)

More from Cloudera, Inc.

More from Cloudera, Inc. (19)

Recently uploaded

Recently uploaded (20)

Strategies for Enterprise Grade Azure-based Analytics

Editor's Notes