The Unbearable Lightness of Ephemeral Processing

The Unbearable Lightness Of
Ephemeral Processing
Diego Baez

2 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Agenda
• Framework
• Computing Profiles
• Ephemeral Clusters
• Practical Recommendations
• Advanced Topics
• Recap

Reserved
What is Ephemeral?
“lasting a very short time; short-lived; transitory. To be discarded once
they served their intended purpose”

Reserved
This is NOT a talk about Snapshat!

Reserved
It takes an average of 6 months to get a new server
ready for application deployment
The larger the organization, the longer it takes, in a process usually spanning multiple
departments, approval processes and implementation teams.
Business Requirement
IT Project Management
Infrastructure
Datacenter Operations
Purchasing
…

Reserved
What if we approach Computing Power as a Utility?
1. On-demand Computing Power
2. Pay only for the resources you use
3. Short Need-to-Processing cycle
4. Always available
5. Suitable for my needs
Think Water or Electricity

Reserved
Benefits
⬢ Respond fast to business needs
⬢ Cost-Effective
⬢ Easily scalable
⬢ Maximum utilization of available infrastructure
We could…

Reserved
What would we need
1. Endless Supply of Computing Power:
– “Inexhaustible” supply of hardware available to us on demand
– Always-On Operational Environment to run Hardware
– Pay only for the consumed resources
– On demand additional computing power
2. Taylor made environment for each unit-of-work we want to execute:
– Should be able to provision from simple to very complex compute power for Compute/Data intensive unit-of-works
– Customized environments for specific needs
– Deploy my own environment “recipes”
3. Operational infrastructure to ”personalize” the environment, retrieve results, and clean-up the
environment:
– Easy deployment, elastic scaling, and destroying after unit-of-work completes
Three basic building blocks

Reserved
Computing Profiles

1
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Three Broad Categories
1. “Firefly” Routines
– Live for a short time
– Stateless
– Contain all information to complete unit-of-work
– Initial used case was Web page Requests, then came Micro services, then IOT,…
2. Data-Intensive “Thunder”
– Very large quantity of data
– Complexity of data
– Weather Analysis
3. Compute-intensive “Lighting”
– CPU cycles are the bottleneck
– Risk Calculations
– Analytics
unit-of-work Scale

1
1
Reserved
1. Firefly Routines
⬢ Stateless light/short unit-of-work initially focused on e-commerce
⬢ Lifetime: short-lived
⬢ Often idempotent, making multiple identical requests has the same effect as making a single request
⬢ FaaS (Function as a Service) :
– AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, IBM OpenWhisk/Watson
– But they have limits on size, memory, disk, concurrency and running time
– Each is Implementation specific, not easily portable
– Unclear operational model
FaaS (Function as a Service)
AWS Lambda Azure Functions

1
2
Reserved
2. Data-Intensive applications
⬢ For Data-Intensive, clusters are the ideal solution.
– Leverage Large numbers of distributed data nodes
– Parallel Disk I/O across many CPU-IO units (nodes)
– Storage aware
– Redundancy and fault tolerance
– Specific stacks for specific data-centric purposes: Hive, HBASE, HDFS
– Custom applications
⬢ Some applications are:
– Machine Learning
– Weather Analysis
– Genetics
– Clustering and Classification
Clusters

1
3
Reserved
3. Compute-Intensive
⬢ For heavy computational unit-of-works, clusters are the ideal solution.
– Parallel Processing
– Parallel Disk I/O across many CPU-IO units (nodes)
– Storage aware computing
– Complementing Technologies together in a cluster – HDFS, Hive, Spark, HBASE
– Higher degree of control
– Custom applications
⬢ Some applications are:
– Risk Calculations
– Analytics
– Machine Learning
Clusters

1
4
Reserved
Compute-Intensive & Data-Intensive applications
⬢ Dedicated Multi-Tenancy Clusters
– Primarily On-Premise
– Cloud Dedicated Infrastructure
– General Purpose
– Simpler once cluster is set up
– But not optimized for any specific unit-of-work
⬢ But Multi-tenancy is a double Edge Sword
– Leverages multi-use of cluster
– Lowers cost
– But…
– High overhead
– Job isolation is not complete
– Needs to pre-provision capacity
– Issues, reconfiguring and maintenance affects everyone
The General Purpose Cluster

1
5
Reserved
Multi-tenancy Musings
⬢ Affinity - how the requests of different users of a tenant are bound to processing nodes. Location awareness
optimization of each application can be different
⬢ Performance Isolation - tenants working within their quota should have their SLAs fulfilled, even if some
other tenants have high workload. One solution id Resource isolation, CPU, RAM, time
⬢ QoS Differentiation – Differences in service quality and SLA.
⬢ Customization – Ability to handle different configuration, requirements and SLA’s
Additional Design Considerations

1
6
Reserved
Enter the Ephemeral Cluster
⬢ Full power cluster
⬢ Need processing power available on-demand
⬢ Taylor made “instances” for specific processing needs
⬢ Zero initial-state
⬢ Process-and-forget
⬢ Zero end-state
⬢ “Discarded” after my use
⬢ Can be long running
⬢ Can be state-full during their operation
A cluster that launches, processes a set of data, and terminates

1
7
Reserved
The Single-Purpose Ephemeral Cluster
The life of the cluster is the duration of the specific unit-of-work, each unit-of-work has its own dedicated cluster for the duration of such unit-of-work.
Managed as a set of independent self-contained clusters, each coming alive for a specific unit-of-work, and disappearing after the results are delivered.
⬢ Pros:
– Affinity: custom built cluster for this specific unit-of-work
– Dedicated QoS: Each unit-of-works has its own dedicated cluster, with concurrency of one.
– Performance Isolation built-in: Extremely simple resource management - cluster is fully dedicated to one unit-of-work
– No contention issues
– Multiple clusters can be run in parallel
– Scaling is virtually limited only by the cloud environment
– Clear audit trail, clear per-unit-of-work resource allocation, transparent per-unit-of-work accounting and contention-free unit-of-work execution.
– Customization: Easy to experiment with different unit-of-work configurations, tweak configurations, and experiment with different component
configurations
⬢ Cons:
– Pay overhead of preparing the environment every launch
– Harder to monitor many concurrent clusters
– No simple “environment-wide” administration

1
8
Reserved
Ephemeral clusters

1
9
Reserved
What infrastructure do I need for Ephemeral Clusters?
1. On-Demand elastic Computing Environment
2. Customized cluster Recipes for specific needs
3. Operational Infrastructure to Launch/Adjust/Scale/Clean-up
The operational platform should be independent from a particular Cloud provider
1. Single interface for many cloud provider
2. Ability to optimize computing-price sensitivity
3. Pick the best of breed
4. Fail-over across cloud providers
Three building blocks

2
0
Reserved
1. On-Demand elastic Computing Environment
⬢ The cloud is the Computing Utility!
⬢ On-demand Computing Power
⬢ Pay only for the resources you use
⬢ Short Need-to-Processing cycle
⬢ Always available
⬢ Scalable
⬢ But Each provider has it’s own
technology
Cloud

2
1
Reserved
2. Taylor made Recipes for specific needs
⬢ Blueprints define a unique recipe for a cluster instantiation
⬢ Blueprints can be generated from a running cluster with the desired configuration, or manually via a JSON file
⬢ Ability to provision an Apache Hadoop cluster without requiring user interaction
⬢ Blueprints contain knowledge around service component layout for a particular Stack definition
Ambari Blueprints

2
2
Reserved
3. Launching/Adjusting/Operations Infrastructure
1. Pick a Blueprint: Cloudbreak uses Ambari Blueprints to have
declarative Hadoop cluster definition. Blueprints can be
designed for specialized applications and workloads (such as
Data Science or IoT Apps). Cloudbreak includes a few default
Blueprints for common cluster configurations but you can
always upload your own Blueprint to build the cluster just the
way you like it.
2. Choose a Cloud: Cloudbreak is configured to work with cloud
infrastructure resources (such as servers, network setup and
security options). Choose the cloud infrastructure you want to
use for the cluster.
3. Launch HDP: In this step, Cloudbreak obtains the chosen cloud
infrastructure platform, installs Apache Ambari and applies the
desired Blueprint. The result: your cluster is launched and ready
to go!
Cloudbreak

2
3
Reserved
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloudbreak
SINGLE
VIEW
ENTERPRISE
READY
ELASTIC FLEXIBLE
Enables provisioning an arbitrary
node Cluster
Enables (de)commissioning nodes
from Cluster
Policy and time based based scaling
of cluster
Declarative and flexible Hadoop cluster
creation using blueprints
Provision to multiple public cloud
providers or Openstack based private
cloud using same common API
Access all of this functionality through
rich UI, secured REST API or
automatable Shell
Supports basic, token based and OAuth2
authentication model
The cluster is provisioned in a logically
isolated network
Tracking usage and cluster metrics

2
4
Reserved
How would we launch an Ephemeral unit-of-work?
1. Specify Cluster Type
2. Provision & Launch Cluster
3. Load my Data
4. Run Compute unit-of-work
5. Retrieve Results
6. Clean-up Environment
Pick a Blueprint
Launch Cluster
Load my Data
Run unit-of-work
Retrieve Results
Clean-up
Environment
User
CLOUDBREAK
CLOUDBREAK
CLOUDBREAK

2
5
Reserved
EASE OF USE: Manage all of your ephemeral
workloads from a convenient and easy to use
dashboard.

2
6
Reserved
EASE OF USE: Choose from a set of pre-tuned
and pre-configured cluster types.

2
7
Reserved
EASE OF USE: Prescriptive customization points
enable the operator to further tune the
infrastructure and cluster as required.

2
8
Reserved
REDUCED OPERATIONAL EFFORT: Simplified
choice of cluster topologies enable automatic
cluster repair, reducing the burden on the
operator.

2
9
Reserved
CONTROL COSTS: Opportunistically leverage
Spot and Reserved Instances to control costs.

3
0
Reserved
INTEGRATED NETWORK SECURITY: A built-in
Protected Gateway, along with advanced network
options, minimizes the network access points.

3
1
Reserved
REDUCED OPERATIONAL EFFORT: Auto-
scaling enables the cluster to dynamically adjust
to the workload without operator input.

3
2
Reserved
REDUCED OPERATIONAL EFFORT: An
integrated and powerful Command Line Interface
(CLI) enables automating cluster creation and
management.

3
3
Reserved
REDUCED OPERATIONAL EFFORT: Simplified
cluster controls and easy access to cluster
resources.

3
4
Reserved
SHARED METASTORE SERVICE: Reusable
shared metadata services provide consistent
schema across and in-between ephemeral
workloads.

3
5
Reserved
Practical Recommendations

3
6
Reserved
Startup Time
⬢ Cluster startup with Cloudbreak takes about 8 minutes:
–Connect to Cloud Provider
–Setup VPC
–Provision Servers Instances
–Install OS
–Install Cluster
–Configure Blueprint
–Start all services
–READY TO PROCESS
⬢ So not suitable to units-of-work requiring immediate response from invocation, but
can work if subsequent fast response is necessary.
8-minute prelude

3
7
Reserved
Elastic Provisioning
⬢ All Ephemeral clusters should be configured for Auto-Scaling, unless the scope on
execution is extremely well known.
⬢ Have multiple Cloud Providers
–Optimal Provider for my task
–Fail Over
–Peak Demand
–Location Suitability (which region can best serve my client base)
Elastic Provisioning

3
8
Reserved
Cluster Overhead
⬢ Clusters in general, have overhead inherent in managing multiple resources and
nodes, preparing an optimal execution path, and managing resources.
⬢ They can be slower to start processing, but usually more than make up in total
speed by extensive use of parallelism, and scaling lineraly
⬢ The more compute-intensive or Data-intensive the unit-of-work, the more benefit
we get from the cluster
Minimum Unit

3
9
Reserved
Storage
⬢ Even the fastest cluster will run very slow if storage access is inefficient, there are I/O
bottlenecks, or if storage access has high latency
⬢ Some strategies:
– Fetch-while-u-wait: Fetch Data in parallel while cluster is instantiating so that all data is available when cluster
is ready to begin processing
– Storage-Warming: One common strategy is to have multiple types of storage to balance speed of access vs
storage cost on the cloud. Hot, Warm, Cold storage, such as Attached vs. S3 vs. Glacier in AWS. As you
instantiate the cluster, move data which needs to be accessed to HOT storage for cluster execution.
– Cache-Loading: Load data into cache when Cluster is instantiated so we maximize speed of execution.
Particularly useful for Analytics running on Spark.
– Extreme-Parallelism: Make sure cluster layout is matched to maximize concurrent processing with concurrent
I/O access. This means usually a ratio on One-CPU per One-Physical-Storage-Devise, so that we can fully utilize
concurrent processing with concurrent disk I/O.
I/O Latency Awareness

4
0
Reserved
Cluster Instantiation
⬢ What triggers the cluster launch?
–Manual
–Event Driven
–Time Schedule
–Capacity Triggers
–Special Purpose
–Isolation
Cluster Start-up

4
1
Reserved
Advanced Topics

4
2
Reserved
1. Spot-Pricing
2. Auto-scaling
3. Obfuscation

4
3
Reserved
1. Spot-Pricing
⬢ Bid in real-time for available computing power
⬢ No guarantees the supply we want will be available
⬢ Can be outbid
⬢ Over 70% cheaper!
Recommendations:
⬢ Over-provision to make sure you have what you require
⬢ But less than the price differential
⬢ So if spot pricing now is 70% cheaper, and we need one hour of compute power => Over provision by 25%
X = regular compute price per minute
Regular price = 60*x
Spot-Pricing with over provision = 60*1.25*(x*(1-0.7)) = 22.5*x = 62.5% Cheaper!
Real-time pricing of cloud computing

4
4
Reserved
2. Auto-scaling
⬢ Even though a cluster will be dedicated to one unit-of-work during its lifetime, we could still run out of resources.
Recommendations:
⬢ Best way to solve is enabling the cluster to grow based on need
⬢ In Cloudbreak, this is achieved via Auto-Scaling:
– Alerts: Create metric or time-based alerts for cluster scaling
– Policies: Scaling policies adjust cluster size based on activity and workload alerts
– General Configurations: Boundaries and cooldown period
Cluster Elasticity

4
5
Reserved
Auto-Scaling Time-Based Alert

4
6
Reserved
Auto-Scaling Metric-Based Alert

4
7
Reserved
Auto-Scaling Policies
⬢ Define the Scale Adjustment (Node Count, Percentage, Exact)
⬢ Select the Host Group (to Scale)
⬢ Select Alert (which when fired, executes the Policy)

4
8
Reserved
Auto-Scaling General Configurations
⬢ Cooldown Period (between scaling actions)
⬢ Minimum and Maximum Cluster size (boundaries)

4
9
Reserved
3. Obfuscation
⬢ Many clients want to leverage the power of elastic computing in the cloud, but are concerned about
possible security breach
⬢ Permanent solutions such as private secure permanent connections to our won secure cloud environment
exist
⬢ Another more generic and portable solution is to scramble only the pieces of sensitive data sent for
processing, keep a key securely on-premise, and unscramble results when they return => Obfuscate the
Data.
⬢ Example: ”John Doe, 1/24/84, 319-392-3429, 12, blue, …” becomes: J@*@ (#(*@), xxxxxxx, xxx-xx-3429,blue,..
Recommendations:
⬢ Use Apache Ranger
Protecting Sensitive Data

5
0
Reserved
Dynamic Masking and Row Level Filtering (Roadmap)
Dept SSN CC No Name DOB MRN Policy ID
01 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424
02 333287465 5391304868205600 Jane Doe 9/13/1969 3736885376 cadsd984
Ranger Policy
Enforcement
Dept SSN CC No MRN Name
01 xxxxx323
3
4539 xxxx xxxx
xxxx
null John Doe
02 xxxxx746
5
5391 xxxx xxxx
xxxx
null Jane Doe
Dept SSN Name Data
1
01 23232323
3
John Doe sdsd
Marketing groups sees
CC and SSN as masked
values and MRN is
nullified
Dept employee
only sees data
specific to that
department

5
1
Reserved
Tag-based Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements
must be tag based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP
substitution maybe required. The rule enforcement but be
geo aware.
• Time-based policy – Timer for data access, de-coupled
from deletion of data.
• Prohibitions – Prevention of combination of Hive
tables/Columns that may pose a risk together.

5
2
Reserved
Recap

5
3
Reserved
Robust Ephemeral Clusters Are Possible today!
⬢ Ephemeral clusters can be launched quickly (minutes), are pre-configured for a specific
processing purpose, and can be brought down quickly as soon as their usefulness has expired
⬢ Organizations can leverage Ephemeral Clusters for parallel compute intensive applications
which require bursts of power
⬢ being able to launch bespoke clusters for specific compute needs in a repeatable fashion and
within a shared infrastructure provides flexibility for special purpose processing needs
⬢ The velocity and elasticity of fast cluster deployment enables seamless peak-demand
provisioning, enables cost optimization by leveraging significantly lower cloud spot pricing, and
maximizes utilization of existing compute capacity
Review

5
4
Reserved
Thank You
Diego Baez
dbaez@hortonworks.com

The Unbearable Lightness of Ephemeral Processing

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The Unbearable Lightness of Ephemeral Processing

Similaire à The Unbearable Lightness of Ephemeral Processing (20)

Plus de DataWorks Summit

Plus de DataWorks Summit (20)

Dernier

Dernier (20)

The Unbearable Lightness of Ephemeral Processing