SlideShare une entreprise Scribd logo
1  sur  54
The Unbearable Lightness Of
Ephemeral Processing
Diego Baez
2 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Agenda
• Framework
• Computing Profiles
• Ephemeral Clusters
• Practical Recommendations
• Advanced Topics
• Recap
3 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
What is Ephemeral?
“lasting a very short time; short-lived; transitory. To be discarded once
they served their intended purpose”
4 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
This is NOT a talk about Snapshat!
5 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
It takes an average of 6 months to get a new server
ready for application deployment
The larger the organization, the longer it takes, in a process usually spanning multiple
departments, approval processes and implementation teams.
Business Requirement
IT Project Management
Infrastructure
Datacenter Operations
Purchasing
…
6 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
What if we approach Computing Power as a Utility?
1. On-demand Computing Power
2. Pay only for the resources you use
3. Short Need-to-Processing cycle
4. Always available
5. Suitable for my needs
Think Water or Electricity
7 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Benefits
⬢ Respond fast to business needs
⬢ Cost-Effective
⬢ Easily scalable
⬢ Maximum utilization of available infrastructure
We could…
8 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
What would we need
1. Endless Supply of Computing Power:
– “Inexhaustible” supply of hardware available to us on demand
– Always-On Operational Environment to run Hardware
– Pay only for the consumed resources
– On demand additional computing power
2. Taylor made environment for each unit-of-work we want to execute:
– Should be able to provision from simple to very complex compute power for Compute/Data intensive unit-of-works
– Customized environments for specific needs
– Deploy my own environment “recipes”
3. Operational infrastructure to ”personalize” the environment, retrieve results, and clean-up the
environment:
– Easy deployment, elastic scaling, and destroying after unit-of-work completes
Three basic building blocks
9 © Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Computing Profiles
1
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Three Broad Categories
1. “Firefly” Routines
– Live for a short time
– Stateless
– Contain all information to complete unit-of-work
– Initial used case was Web page Requests, then came Micro services, then IOT,…
2. Data-Intensive “Thunder”
– Very large quantity of data
– Complexity of data
– Weather Analysis
3. Compute-intensive “Lighting”
– CPU cycles are the bottleneck
– Risk Calculations
– Analytics
unit-of-work Scale
1
1
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
1. Firefly Routines
⬢ Stateless light/short unit-of-work initially focused on e-commerce
⬢ Lifetime: short-lived
⬢ Often idempotent, making multiple identical requests has the same effect as making a single request
⬢ FaaS (Function as a Service) :
– AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, IBM OpenWhisk/Watson
– But they have limits on size, memory, disk, concurrency and running time
– Each is Implementation specific, not easily portable
– Unclear operational model
FaaS (Function as a Service)
AWS Lambda Azure Functions
1
2
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
2. Data-Intensive applications
⬢ For Data-Intensive, clusters are the ideal solution.
– Leverage Large numbers of distributed data nodes
– Parallel Disk I/O across many CPU-IO units (nodes)
– Storage aware
– Redundancy and fault tolerance
– Specific stacks for specific data-centric purposes: Hive, HBASE, HDFS
– Custom applications
⬢ Some applications are:
– Machine Learning
– Weather Analysis
– Genetics
– Clustering and Classification
Clusters
1
3
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
3. Compute-Intensive
⬢ For heavy computational unit-of-works, clusters are the ideal solution.
– Parallel Processing
– Parallel Disk I/O across many CPU-IO units (nodes)
– Storage aware computing
– Complementing Technologies together in a cluster – HDFS, Hive, Spark, HBASE
– Higher degree of control
– Custom applications
⬢ Some applications are:
– Risk Calculations
– Analytics
– Machine Learning
Clusters
1
4
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Compute-Intensive & Data-Intensive applications
⬢ Dedicated Multi-Tenancy Clusters
– Primarily On-Premise
– Cloud Dedicated Infrastructure
– General Purpose
– Simpler once cluster is set up
– But not optimized for any specific unit-of-work
⬢ But Multi-tenancy is a double Edge Sword
– Leverages multi-use of cluster
– Lowers cost
– But…
– High overhead
– Job isolation is not complete
– Needs to pre-provision capacity
– Issues, reconfiguring and maintenance affects everyone
The General Purpose Cluster
1
5
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Multi-tenancy Musings
⬢ Affinity - how the requests of different users of a tenant are bound to processing nodes. Location awareness
optimization of each application can be different
⬢ Performance Isolation - tenants working within their quota should have their SLAs fulfilled, even if some
other tenants have high workload. One solution id Resource isolation, CPU, RAM, time
⬢ QoS Differentiation – Differences in service quality and SLA.
⬢ Customization – Ability to handle different configuration, requirements and SLA’s
Additional Design Considerations
1
6
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Enter the Ephemeral Cluster
⬢ Full power cluster
⬢ Need processing power available on-demand
⬢ Taylor made “instances” for specific processing needs
⬢ Zero initial-state
⬢ Process-and-forget
⬢ Zero end-state
⬢ “Discarded” after my use
⬢ Can be long running
⬢ Can be state-full during their operation
A cluster that launches, processes a set of data, and terminates
1
7
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
The Single-Purpose Ephemeral Cluster
The life of the cluster is the duration of the specific unit-of-work, each unit-of-work has its own dedicated cluster for the duration of such unit-of-work.
Managed as a set of independent self-contained clusters, each coming alive for a specific unit-of-work, and disappearing after the results are delivered.
⬢ Pros:
– Affinity: custom built cluster for this specific unit-of-work
– Dedicated QoS: Each unit-of-works has its own dedicated cluster, with concurrency of one.
– Performance Isolation built-in: Extremely simple resource management - cluster is fully dedicated to one unit-of-work
– No contention issues
– Multiple clusters can be run in parallel
– Scaling is virtually limited only by the cloud environment
– Clear audit trail, clear per-unit-of-work resource allocation, transparent per-unit-of-work accounting and contention-free unit-of-work execution.
– Customization: Easy to experiment with different unit-of-work configurations, tweak configurations, and experiment with different component
configurations
⬢ Cons:
– Pay overhead of preparing the environment every launch
– Harder to monitor many concurrent clusters
– No simple “environment-wide” administration
1
8
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Ephemeral clusters
1
9
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
What infrastructure do I need for Ephemeral Clusters?
1. On-Demand elastic Computing Environment
2. Customized cluster Recipes for specific needs
3. Operational Infrastructure to Launch/Adjust/Scale/Clean-up
The operational platform should be independent from a particular Cloud provider
1. Single interface for many cloud provider
2. Ability to optimize computing-price sensitivity
3. Pick the best of breed
4. Fail-over across cloud providers
Three building blocks
2
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
1. On-Demand elastic Computing Environment
⬢ The cloud is the Computing Utility!
⬢ On-demand Computing Power
⬢ Pay only for the resources you use
⬢ Short Need-to-Processing cycle
⬢ Always available
⬢ Scalable
⬢ But Each provider has it’s own
technology
Cloud
2
1
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
2. Taylor made Recipes for specific needs
⬢ Blueprints define a unique recipe for a cluster instantiation
⬢ Blueprints can be generated from a running cluster with the desired configuration, or manually via a JSON file
⬢ Ability to provision an Apache Hadoop cluster without requiring user interaction
⬢ Blueprints contain knowledge around service component layout for a particular Stack definition
Ambari Blueprints
2
2
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
3. Launching/Adjusting/Operations Infrastructure
1. Pick a Blueprint: Cloudbreak uses Ambari Blueprints to have
declarative Hadoop cluster definition. Blueprints can be
designed for specialized applications and workloads (such as
Data Science or IoT Apps). Cloudbreak includes a few default
Blueprints for common cluster configurations but you can
always upload your own Blueprint to build the cluster just the
way you like it.
2. Choose a Cloud: Cloudbreak is configured to work with cloud
infrastructure resources (such as servers, network setup and
security options). Choose the cloud infrastructure you want to
use for the cluster.
3. Launch HDP: In this step, Cloudbreak obtains the chosen cloud
infrastructure platform, installs Apache Ambari and applies the
desired Blueprint. The result: your cluster is launched and ready
to go!
Cloudbreak
2
3
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Cloudbreak
SINGLE
VIEW
ENTERPRISE
READY
ELASTIC FLEXIBLE
Enables provisioning an arbitrary
node Cluster
Enables (de)commissioning nodes
from Cluster
Policy and time based based scaling
of cluster
Declarative and flexible Hadoop cluster
creation using blueprints
Provision to multiple public cloud
providers or Openstack based private
cloud using same common API
Access all of this functionality through
rich UI, secured REST API or
automatable Shell
Supports basic, token based and OAuth2
authentication model
The cluster is provisioned in a logically
isolated network
Tracking usage and cluster metrics
2
4
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
How would we launch an Ephemeral unit-of-work?
1. Specify Cluster Type
2. Provision & Launch Cluster
3. Load my Data
4. Run Compute unit-of-work
5. Retrieve Results
6. Clean-up Environment
Pick a Blueprint
Launch Cluster
Load my Data
Run unit-of-work
Retrieve Results
Clean-up
Environment
User
CLOUDBREAK
CLOUDBREAK
CLOUDBREAK
2
5
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
EASE OF USE: Manage all of your ephemeral
workloads from a convenient and easy to use
dashboard.
2
6
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
EASE OF USE: Choose from a set of pre-tuned
and pre-configured cluster types.
2
7
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
EASE OF USE: Prescriptive customization points
enable the operator to further tune the
infrastructure and cluster as required.
2
8
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
REDUCED OPERATIONAL EFFORT: Simplified
choice of cluster topologies enable automatic
cluster repair, reducing the burden on the
operator.
2
9
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
CONTROL COSTS: Opportunistically leverage
Spot and Reserved Instances to control costs.
3
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
INTEGRATED NETWORK SECURITY: A built-in
Protected Gateway, along with advanced network
options, minimizes the network access points.
3
1
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
REDUCED OPERATIONAL EFFORT: Auto-
scaling enables the cluster to dynamically adjust
to the workload without operator input.
3
2
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
REDUCED OPERATIONAL EFFORT: An
integrated and powerful Command Line Interface
(CLI) enables automating cluster creation and
management.
3
3
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
REDUCED OPERATIONAL EFFORT: Simplified
cluster controls and easy access to cluster
resources.
3
4
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
SHARED METASTORE SERVICE: Reusable
shared metadata services provide consistent
schema across and in-between ephemeral
workloads.
3
5
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Practical Recommendations
3
6
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Startup Time
⬢ Cluster startup with Cloudbreak takes about 8 minutes:
–Connect to Cloud Provider
–Setup VPC
–Provision Servers Instances
–Install OS
–Install Cluster
–Configure Blueprint
–Start all services
–READY TO PROCESS
⬢ So not suitable to units-of-work requiring immediate response from invocation, but
can work if subsequent fast response is necessary.
8-minute prelude
3
7
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Elastic Provisioning
⬢ All Ephemeral clusters should be configured for Auto-Scaling, unless the scope on
execution is extremely well known.
⬢ Have multiple Cloud Providers
–Optimal Provider for my task
–Fail Over
–Peak Demand
–Location Suitability (which region can best serve my client base)
Elastic Provisioning
3
8
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Cluster Overhead
⬢ Clusters in general, have overhead inherent in managing multiple resources and
nodes, preparing an optimal execution path, and managing resources.
⬢ They can be slower to start processing, but usually more than make up in total
speed by extensive use of parallelism, and scaling lineraly
⬢ The more compute-intensive or Data-intensive the unit-of-work, the more benefit
we get from the cluster
Minimum Unit
3
9
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Storage
⬢ Even the fastest cluster will run very slow if storage access is inefficient, there are I/O
bottlenecks, or if storage access has high latency
⬢ Some strategies:
– Fetch-while-u-wait: Fetch Data in parallel while cluster is instantiating so that all data is available when cluster
is ready to begin processing
– Storage-Warming: One common strategy is to have multiple types of storage to balance speed of access vs
storage cost on the cloud. Hot, Warm, Cold storage, such as Attached vs. S3 vs. Glacier in AWS. As you
instantiate the cluster, move data which needs to be accessed to HOT storage for cluster execution.
– Cache-Loading: Load data into cache when Cluster is instantiated so we maximize speed of execution.
Particularly useful for Analytics running on Spark.
– Extreme-Parallelism: Make sure cluster layout is matched to maximize concurrent processing with concurrent
I/O access. This means usually a ratio on One-CPU per One-Physical-Storage-Devise, so that we can fully utilize
concurrent processing with concurrent disk I/O.
I/O Latency Awareness
4
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Cluster Instantiation
⬢ What triggers the cluster launch?
–Manual
–Event Driven
–Time Schedule
–Capacity Triggers
–Special Purpose
–Isolation
Cluster Start-up
4
1
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Advanced Topics
4
2
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
1. Spot-Pricing
2. Auto-scaling
3. Obfuscation
4
3
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
1. Spot-Pricing
⬢ Bid in real-time for available computing power
⬢ No guarantees the supply we want will be available
⬢ Can be outbid
⬢ Over 70% cheaper!
Recommendations:
⬢ Over-provision to make sure you have what you require
⬢ But less than the price differential
⬢ So if spot pricing now is 70% cheaper, and we need one hour of compute power => Over provision by 25%
X = regular compute price per minute
Regular price = 60*x
Spot-Pricing with over provision = 60*1.25*(x*(1-0.7)) = 22.5*x = 62.5% Cheaper!
Real-time pricing of cloud computing
4
4
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
2. Auto-scaling
⬢ Even though a cluster will be dedicated to one unit-of-work during its lifetime, we could still run out of resources.
Recommendations:
⬢ Best way to solve is enabling the cluster to grow based on need
⬢ In Cloudbreak, this is achieved via Auto-Scaling:
– Alerts: Create metric or time-based alerts for cluster scaling
– Policies: Scaling policies adjust cluster size based on activity and workload alerts
– General Configurations: Boundaries and cooldown period
Cluster Elasticity
4
5
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-Scaling Time-Based Alert
4
6
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-Scaling Metric-Based Alert
4
7
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-Scaling Policies
⬢ Define the Scale Adjustment (Node Count, Percentage, Exact)
⬢ Select the Host Group (to Scale)
⬢ Select Alert (which when fired, executes the Policy)
4
8
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Auto-Scaling General Configurations
⬢ Cooldown Period (between scaling actions)
⬢ Minimum and Maximum Cluster size (boundaries)
4
9
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
3. Obfuscation
⬢ Many clients want to leverage the power of elastic computing in the cloud, but are concerned about
possible security breach
⬢ Permanent solutions such as private secure permanent connections to our won secure cloud environment
exist
⬢ Another more generic and portable solution is to scramble only the pieces of sensitive data sent for
processing, keep a key securely on-premise, and unscramble results when they return => Obfuscate the
Data.
⬢ Example: ”John Doe, 1/24/84, 319-392-3429, 12, blue, …” becomes: J@*@ (#(*@), xxxxxxx, xxx-xx-3429,blue,..
Recommendations:
⬢ Use Apache Ranger
Protecting Sensitive Data
5
0
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Dynamic Masking and Row Level Filtering (Roadmap)
Dept SSN CC No Name DOB MRN Policy ID
01 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424
02 333287465 5391304868205600 Jane Doe 9/13/1969 3736885376 cadsd984
Ranger Policy
Enforcement
Dept SSN CC No MRN Name
01 xxxxx323
3
4539 xxxx xxxx
xxxx
null John Doe
02 xxxxx746
5
5391 xxxx xxxx
xxxx
null Jane Doe
Dept SSN Name Data
1
01 23232323
3
John Doe sdsd
Marketing groups sees
CC and SSN as masked
values and MRN is
nullified
Dept employee
only sees data
specific to that
department
5
1
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Tag-based Access Policy Requirements
• Basic Tag policy – PII example. Access and entitlements
must be tag based ABAC and scalable in implementation.
• Geo-based policy – Policy based on IP address, proxy IP
substitution maybe required. The rule enforcement but be
geo aware.
• Time-based policy – Timer for data access, de-coupled
from deletion of data.
• Prohibitions – Prevention of combination of Hive
tables/Columns that may pose a risk together.
5
2
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Recap
5
3
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Robust Ephemeral Clusters Are Possible today!
⬢ Ephemeral clusters can be launched quickly (minutes), are pre-configured for a specific
processing purpose, and can be brought down quickly as soon as their usefulness has expired
⬢ Organizations can leverage Ephemeral Clusters for parallel compute intensive applications
which require bursts of power
⬢ being able to launch bespoke clusters for specific compute needs in a repeatable fashion and
within a shared infrastructure provides flexibility for special purpose processing needs
⬢ The velocity and elasticity of fast cluster deployment enables seamless peak-demand
provisioning, enables cost optimization by leveraging significantly lower cloud spot pricing, and
maximizes utilization of existing compute capacity
Review
5
4
© Hortonworks Inc. 2011 – 2016. All Rights
Reserved
Thank You
Diego Baez
dbaez@hortonworks.com

Contenu connexe

Tendances

A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesDataWorks Summit
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3Hortonworks
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeDataWorks Summit
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesDataWorks Summit
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...DataWorks Summit/Hadoop Summit
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsDataWorks Summit/Hadoop Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...DataWorks Summit
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDataWorks Summit
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...DataWorks Summit
 

Tendances (20)

Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
Schema Registry - Set Your Data Free
Schema Registry - Set Your Data FreeSchema Registry - Set Your Data Free
Schema Registry - Set Your Data Free
 
Insights into Real-world Data Management Challenges
Insights into Real-world Data Management ChallengesInsights into Real-world Data Management Challenges
Insights into Real-world Data Management Challenges
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Scheduling Policies in YARN
Scheduling Policies in YARNScheduling Policies in YARN
Scheduling Policies in YARN
 
Cloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World ConsiderationsCloudy with a Chance of Hadoop - Real World Considerations
Cloudy with a Chance of Hadoop - Real World Considerations
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...Treat your enterprise data lake indigestion: Enterprise ready security and go...
Treat your enterprise data lake indigestion: Enterprise ready security and go...
 
Dynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the flyDynamic DDL: Adding structure to streaming IoT data on the fly
Dynamic DDL: Adding structure to streaming IoT data on the fly
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
 

Similaire à The Unbearable Lightness of Ephemeral Processing

Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
What's new in informix v11.70
What's new in informix v11.70What's new in informix v11.70
What's new in informix v11.70am_prasanna
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...Hendrik van Run
 
Hybrid Cloud Tutorial Linkedin 2
Hybrid Cloud Tutorial Linkedin 2Hybrid Cloud Tutorial Linkedin 2
Hybrid Cloud Tutorial Linkedin 2David Rilett
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureDataWorks Summit
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureDataWorks Summit
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesJason TC HOU (侯宗成)
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoopGergely Devenyi
 
CSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication SystemCSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication SystemHendrik van Run
 
OpenStack Enabling DevOps
OpenStack Enabling DevOpsOpenStack Enabling DevOps
OpenStack Enabling DevOpsCisco DevNet
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash CourseDataWorks Summit
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateNovell
 

Similaire à The Unbearable Lightness of Ephemeral Processing (20)

Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
What's new in informix v11.70
What's new in informix v11.70What's new in informix v11.70
What's new in informix v11.70
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
2689 - Exploring IBM PureApplication System and IBM Workload Deployer Best Pr...
 
Hybrid Cloud Tutorial Linkedin 2
Hybrid Cloud Tutorial Linkedin 2Hybrid Cloud Tutorial Linkedin 2
Hybrid Cloud Tutorial Linkedin 2
 
Hadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
 
Hadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and FutureHadoop Operations – Past, Present, and Future
Hadoop Operations – Past, Present, and Future
 
Introduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network IssuesIntroduction to Cloud Data Center and Network Issues
Introduction to Cloud Data Center and Network Issues
 
Micro services vs hadoop
Micro services vs hadoopMicro services vs hadoop
Micro services vs hadoop
 
CSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication SystemCSD-2881 - Achieving System Production Readiness for IBM PureApplication System
CSD-2881 - Achieving System Production Readiness for IBM PureApplication System
 
OpenStack Enabling DevOps
OpenStack Enabling DevOpsOpenStack Enabling DevOps
OpenStack Enabling DevOps
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 
Run Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin OrchestrateRun Book Automation with PlateSpin Orchestrate
Run Book Automation with PlateSpin Orchestrate
 

Plus de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Plus de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 

The Unbearable Lightness of Ephemeral Processing

  • 1. The Unbearable Lightness Of Ephemeral Processing Diego Baez
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Framework • Computing Profiles • Ephemeral Clusters • Practical Recommendations • Advanced Topics • Recap
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Ephemeral? “lasting a very short time; short-lived; transitory. To be discarded once they served their intended purpose”
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved This is NOT a talk about Snapshat!
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved It takes an average of 6 months to get a new server ready for application deployment The larger the organization, the longer it takes, in a process usually spanning multiple departments, approval processes and implementation teams. Business Requirement IT Project Management Infrastructure Datacenter Operations Purchasing …
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What if we approach Computing Power as a Utility? 1. On-demand Computing Power 2. Pay only for the resources you use 3. Short Need-to-Processing cycle 4. Always available 5. Suitable for my needs Think Water or Electricity
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Benefits ⬢ Respond fast to business needs ⬢ Cost-Effective ⬢ Easily scalable ⬢ Maximum utilization of available infrastructure We could…
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What would we need 1. Endless Supply of Computing Power: – “Inexhaustible” supply of hardware available to us on demand – Always-On Operational Environment to run Hardware – Pay only for the consumed resources – On demand additional computing power 2. Taylor made environment for each unit-of-work we want to execute: – Should be able to provision from simple to very complex compute power for Compute/Data intensive unit-of-works – Customized environments for specific needs – Deploy my own environment “recipes” 3. Operational infrastructure to ”personalize” the environment, retrieve results, and clean-up the environment: – Easy deployment, elastic scaling, and destroying after unit-of-work completes Three basic building blocks
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Computing Profiles
  • 10. 1 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Three Broad Categories 1. “Firefly” Routines – Live for a short time – Stateless – Contain all information to complete unit-of-work – Initial used case was Web page Requests, then came Micro services, then IOT,… 2. Data-Intensive “Thunder” – Very large quantity of data – Complexity of data – Weather Analysis 3. Compute-intensive “Lighting” – CPU cycles are the bottleneck – Risk Calculations – Analytics unit-of-work Scale
  • 11. 1 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Firefly Routines ⬢ Stateless light/short unit-of-work initially focused on e-commerce ⬢ Lifetime: short-lived ⬢ Often idempotent, making multiple identical requests has the same effect as making a single request ⬢ FaaS (Function as a Service) : – AWS Lambda, Google Cloud Functions, Microsoft Azure Functions, IBM OpenWhisk/Watson – But they have limits on size, memory, disk, concurrency and running time – Each is Implementation specific, not easily portable – Unclear operational model FaaS (Function as a Service) AWS Lambda Azure Functions
  • 12. 1 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 2. Data-Intensive applications ⬢ For Data-Intensive, clusters are the ideal solution. – Leverage Large numbers of distributed data nodes – Parallel Disk I/O across many CPU-IO units (nodes) – Storage aware – Redundancy and fault tolerance – Specific stacks for specific data-centric purposes: Hive, HBASE, HDFS – Custom applications ⬢ Some applications are: – Machine Learning – Weather Analysis – Genetics – Clustering and Classification Clusters
  • 13. 1 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 3. Compute-Intensive ⬢ For heavy computational unit-of-works, clusters are the ideal solution. – Parallel Processing – Parallel Disk I/O across many CPU-IO units (nodes) – Storage aware computing – Complementing Technologies together in a cluster – HDFS, Hive, Spark, HBASE – Higher degree of control – Custom applications ⬢ Some applications are: – Risk Calculations – Analytics – Machine Learning Clusters
  • 14. 1 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compute-Intensive & Data-Intensive applications ⬢ Dedicated Multi-Tenancy Clusters – Primarily On-Premise – Cloud Dedicated Infrastructure – General Purpose – Simpler once cluster is set up – But not optimized for any specific unit-of-work ⬢ But Multi-tenancy is a double Edge Sword – Leverages multi-use of cluster – Lowers cost – But… – High overhead – Job isolation is not complete – Needs to pre-provision capacity – Issues, reconfiguring and maintenance affects everyone The General Purpose Cluster
  • 15. 1 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Multi-tenancy Musings ⬢ Affinity - how the requests of different users of a tenant are bound to processing nodes. Location awareness optimization of each application can be different ⬢ Performance Isolation - tenants working within their quota should have their SLAs fulfilled, even if some other tenants have high workload. One solution id Resource isolation, CPU, RAM, time ⬢ QoS Differentiation – Differences in service quality and SLA. ⬢ Customization – Ability to handle different configuration, requirements and SLA’s Additional Design Considerations
  • 16. 1 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Enter the Ephemeral Cluster ⬢ Full power cluster ⬢ Need processing power available on-demand ⬢ Taylor made “instances” for specific processing needs ⬢ Zero initial-state ⬢ Process-and-forget ⬢ Zero end-state ⬢ “Discarded” after my use ⬢ Can be long running ⬢ Can be state-full during their operation A cluster that launches, processes a set of data, and terminates
  • 17. 1 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Single-Purpose Ephemeral Cluster The life of the cluster is the duration of the specific unit-of-work, each unit-of-work has its own dedicated cluster for the duration of such unit-of-work. Managed as a set of independent self-contained clusters, each coming alive for a specific unit-of-work, and disappearing after the results are delivered. ⬢ Pros: – Affinity: custom built cluster for this specific unit-of-work – Dedicated QoS: Each unit-of-works has its own dedicated cluster, with concurrency of one. – Performance Isolation built-in: Extremely simple resource management - cluster is fully dedicated to one unit-of-work – No contention issues – Multiple clusters can be run in parallel – Scaling is virtually limited only by the cloud environment – Clear audit trail, clear per-unit-of-work resource allocation, transparent per-unit-of-work accounting and contention-free unit-of-work execution. – Customization: Easy to experiment with different unit-of-work configurations, tweak configurations, and experiment with different component configurations ⬢ Cons: – Pay overhead of preparing the environment every launch – Harder to monitor many concurrent clusters – No simple “environment-wide” administration
  • 18. 1 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ephemeral clusters
  • 19. 1 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What infrastructure do I need for Ephemeral Clusters? 1. On-Demand elastic Computing Environment 2. Customized cluster Recipes for specific needs 3. Operational Infrastructure to Launch/Adjust/Scale/Clean-up The operational platform should be independent from a particular Cloud provider 1. Single interface for many cloud provider 2. Ability to optimize computing-price sensitivity 3. Pick the best of breed 4. Fail-over across cloud providers Three building blocks
  • 20. 2 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. On-Demand elastic Computing Environment ⬢ The cloud is the Computing Utility! ⬢ On-demand Computing Power ⬢ Pay only for the resources you use ⬢ Short Need-to-Processing cycle ⬢ Always available ⬢ Scalable ⬢ But Each provider has it’s own technology Cloud
  • 21. 2 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 2. Taylor made Recipes for specific needs ⬢ Blueprints define a unique recipe for a cluster instantiation ⬢ Blueprints can be generated from a running cluster with the desired configuration, or manually via a JSON file ⬢ Ability to provision an Apache Hadoop cluster without requiring user interaction ⬢ Blueprints contain knowledge around service component layout for a particular Stack definition Ambari Blueprints
  • 22. 2 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 3. Launching/Adjusting/Operations Infrastructure 1. Pick a Blueprint: Cloudbreak uses Ambari Blueprints to have declarative Hadoop cluster definition. Blueprints can be designed for specialized applications and workloads (such as Data Science or IoT Apps). Cloudbreak includes a few default Blueprints for common cluster configurations but you can always upload your own Blueprint to build the cluster just the way you like it. 2. Choose a Cloud: Cloudbreak is configured to work with cloud infrastructure resources (such as servers, network setup and security options). Choose the cloud infrastructure you want to use for the cluster. 3. Launch HDP: In this step, Cloudbreak obtains the chosen cloud infrastructure platform, installs Apache Ambari and applies the desired Blueprint. The result: your cluster is launched and ready to go! Cloudbreak
  • 23. 2 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cloudbreak SINGLE VIEW ENTERPRISE READY ELASTIC FLEXIBLE Enables provisioning an arbitrary node Cluster Enables (de)commissioning nodes from Cluster Policy and time based based scaling of cluster Declarative and flexible Hadoop cluster creation using blueprints Provision to multiple public cloud providers or Openstack based private cloud using same common API Access all of this functionality through rich UI, secured REST API or automatable Shell Supports basic, token based and OAuth2 authentication model The cluster is provisioned in a logically isolated network Tracking usage and cluster metrics
  • 24. 2 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved How would we launch an Ephemeral unit-of-work? 1. Specify Cluster Type 2. Provision & Launch Cluster 3. Load my Data 4. Run Compute unit-of-work 5. Retrieve Results 6. Clean-up Environment Pick a Blueprint Launch Cluster Load my Data Run unit-of-work Retrieve Results Clean-up Environment User CLOUDBREAK CLOUDBREAK CLOUDBREAK
  • 25. 2 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EASE OF USE: Manage all of your ephemeral workloads from a convenient and easy to use dashboard.
  • 26. 2 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EASE OF USE: Choose from a set of pre-tuned and pre-configured cluster types.
  • 27. 2 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved EASE OF USE: Prescriptive customization points enable the operator to further tune the infrastructure and cluster as required.
  • 28. 2 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved REDUCED OPERATIONAL EFFORT: Simplified choice of cluster topologies enable automatic cluster repair, reducing the burden on the operator.
  • 29. 2 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved CONTROL COSTS: Opportunistically leverage Spot and Reserved Instances to control costs.
  • 30. 3 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved INTEGRATED NETWORK SECURITY: A built-in Protected Gateway, along with advanced network options, minimizes the network access points.
  • 31. 3 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved REDUCED OPERATIONAL EFFORT: Auto- scaling enables the cluster to dynamically adjust to the workload without operator input.
  • 32. 3 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved REDUCED OPERATIONAL EFFORT: An integrated and powerful Command Line Interface (CLI) enables automating cluster creation and management.
  • 33. 3 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved REDUCED OPERATIONAL EFFORT: Simplified cluster controls and easy access to cluster resources.
  • 34. 3 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SHARED METASTORE SERVICE: Reusable shared metadata services provide consistent schema across and in-between ephemeral workloads.
  • 35. 3 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Practical Recommendations
  • 36. 3 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Startup Time ⬢ Cluster startup with Cloudbreak takes about 8 minutes: –Connect to Cloud Provider –Setup VPC –Provision Servers Instances –Install OS –Install Cluster –Configure Blueprint –Start all services –READY TO PROCESS ⬢ So not suitable to units-of-work requiring immediate response from invocation, but can work if subsequent fast response is necessary. 8-minute prelude
  • 37. 3 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Elastic Provisioning ⬢ All Ephemeral clusters should be configured for Auto-Scaling, unless the scope on execution is extremely well known. ⬢ Have multiple Cloud Providers –Optimal Provider for my task –Fail Over –Peak Demand –Location Suitability (which region can best serve my client base) Elastic Provisioning
  • 38. 3 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cluster Overhead ⬢ Clusters in general, have overhead inherent in managing multiple resources and nodes, preparing an optimal execution path, and managing resources. ⬢ They can be slower to start processing, but usually more than make up in total speed by extensive use of parallelism, and scaling lineraly ⬢ The more compute-intensive or Data-intensive the unit-of-work, the more benefit we get from the cluster Minimum Unit
  • 39. 3 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storage ⬢ Even the fastest cluster will run very slow if storage access is inefficient, there are I/O bottlenecks, or if storage access has high latency ⬢ Some strategies: – Fetch-while-u-wait: Fetch Data in parallel while cluster is instantiating so that all data is available when cluster is ready to begin processing – Storage-Warming: One common strategy is to have multiple types of storage to balance speed of access vs storage cost on the cloud. Hot, Warm, Cold storage, such as Attached vs. S3 vs. Glacier in AWS. As you instantiate the cluster, move data which needs to be accessed to HOT storage for cluster execution. – Cache-Loading: Load data into cache when Cluster is instantiated so we maximize speed of execution. Particularly useful for Analytics running on Spark. – Extreme-Parallelism: Make sure cluster layout is matched to maximize concurrent processing with concurrent I/O access. This means usually a ratio on One-CPU per One-Physical-Storage-Devise, so that we can fully utilize concurrent processing with concurrent disk I/O. I/O Latency Awareness
  • 40. 4 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cluster Instantiation ⬢ What triggers the cluster launch? –Manual –Event Driven –Time Schedule –Capacity Triggers –Special Purpose –Isolation Cluster Start-up
  • 41. 4 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Advanced Topics
  • 42. 4 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Spot-Pricing 2. Auto-scaling 3. Obfuscation
  • 43. 4 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Spot-Pricing ⬢ Bid in real-time for available computing power ⬢ No guarantees the supply we want will be available ⬢ Can be outbid ⬢ Over 70% cheaper! Recommendations: ⬢ Over-provision to make sure you have what you require ⬢ But less than the price differential ⬢ So if spot pricing now is 70% cheaper, and we need one hour of compute power => Over provision by 25% X = regular compute price per minute Regular price = 60*x Spot-Pricing with over provision = 60*1.25*(x*(1-0.7)) = 22.5*x = 62.5% Cheaper! Real-time pricing of cloud computing
  • 44. 4 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 2. Auto-scaling ⬢ Even though a cluster will be dedicated to one unit-of-work during its lifetime, we could still run out of resources. Recommendations: ⬢ Best way to solve is enabling the cluster to grow based on need ⬢ In Cloudbreak, this is achieved via Auto-Scaling: – Alerts: Create metric or time-based alerts for cluster scaling – Policies: Scaling policies adjust cluster size based on activity and workload alerts – General Configurations: Boundaries and cooldown period Cluster Elasticity
  • 45. 4 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-Scaling Time-Based Alert
  • 46. 4 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-Scaling Metric-Based Alert
  • 47. 4 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-Scaling Policies ⬢ Define the Scale Adjustment (Node Count, Percentage, Exact) ⬢ Select the Host Group (to Scale) ⬢ Select Alert (which when fired, executes the Policy)
  • 48. 4 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 48 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Auto-Scaling General Configurations ⬢ Cooldown Period (between scaling actions) ⬢ Minimum and Maximum Cluster size (boundaries)
  • 49. 4 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 3. Obfuscation ⬢ Many clients want to leverage the power of elastic computing in the cloud, but are concerned about possible security breach ⬢ Permanent solutions such as private secure permanent connections to our won secure cloud environment exist ⬢ Another more generic and portable solution is to scramble only the pieces of sensitive data sent for processing, keep a key securely on-premise, and unscramble results when they return => Obfuscate the Data. ⬢ Example: ”John Doe, 1/24/84, 319-392-3429, 12, blue, …” becomes: J@*@ (#(*@), xxxxxxx, xxx-xx-3429,blue,.. Recommendations: ⬢ Use Apache Ranger Protecting Sensitive Data
  • 50. 5 0 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Masking and Row Level Filtering (Roadmap) Dept SSN CC No Name DOB MRN Policy ID 01 232323233 4539067047629850 John Doe 9/12/1969 8233054331 nj23j424 02 333287465 5391304868205600 Jane Doe 9/13/1969 3736885376 cadsd984 Ranger Policy Enforcement Dept SSN CC No MRN Name 01 xxxxx323 3 4539 xxxx xxxx xxxx null John Doe 02 xxxxx746 5 5391 xxxx xxxx xxxx null Jane Doe Dept SSN Name Data 1 01 23232323 3 John Doe sdsd Marketing groups sees CC and SSN as masked values and MRN is nullified Dept employee only sees data specific to that department
  • 51. 5 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tag-based Access Policy Requirements • Basic Tag policy – PII example. Access and entitlements must be tag based ABAC and scalable in implementation. • Geo-based policy – Policy based on IP address, proxy IP substitution maybe required. The rule enforcement but be geo aware. • Time-based policy – Timer for data access, de-coupled from deletion of data. • Prohibitions – Prevention of combination of Hive tables/Columns that may pose a risk together.
  • 52. 5 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Recap
  • 53. 5 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Robust Ephemeral Clusters Are Possible today! ⬢ Ephemeral clusters can be launched quickly (minutes), are pre-configured for a specific processing purpose, and can be brought down quickly as soon as their usefulness has expired ⬢ Organizations can leverage Ephemeral Clusters for parallel compute intensive applications which require bursts of power ⬢ being able to launch bespoke clusters for specific compute needs in a repeatable fashion and within a shared infrastructure provides flexibility for special purpose processing needs ⬢ The velocity and elasticity of fast cluster deployment enables seamless peak-demand provisioning, enables cost optimization by leveraging significantly lower cloud spot pricing, and maximizes utilization of existing compute capacity Review
  • 54. 5 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You Diego Baez dbaez@hortonworks.com