SlideShare une entreprise Scribd logo
1  sur  38
MANILA* AND SAHARA*: CROSSING
THE DESERT TO THE BIG DATA OASIS
Ethan Gafford, Red Hat
Jeff Applewhite, NetApp
Malini Bhandaru, Intel
covering for Weiting Chen
AGENDA
• Introduction
• Sahara Overview
• Manila Overview
• The goal for Sahara and Manila integration
• The approaches
•Manila HDFS Driver
•Manila NFS Share Mount
•Manila + NetApp NFS Connector for Hadoop
• Conclusion
• Q&A
2Intel NetApp RedHat
Sahara: The Problem
Hadoop* (and Spark*, Storm*…) clusters are difficult to configure
Commodity hardware is cheap but requires frequent (costly) maintenance
Reliable hardware is expensive, and a fixed-size cluster will cause contention
Demand for data processing varies over time within an organization
Baremetal clusters go down, and can be a single point of failure
Hadoop dev is very difficult without a real cluster
TL;DR: Data processing clusters are harder to provision and maintain than they
should be, and it hurts.
3
Intel NetApp RedHat
Sahara: The Solution
Put it in a cloud!
Then have easy-to-use, standardized interfaces:
● To create clusters (reliably and repeatedly)
● To scale clusters
● To run data processing jobs
● On any popular data processing framework
● With sensible defaults that just work
● And sophisticated configuration management for expert users
That's OpenStack* Sahara.
4
Intel NetApp RedHat
Sahara: The API
5
Intel NetApp RedHat
Sahara: Architecture
6
Intel NetApp RedHat
Manila
7
Intel NetApp RedHat
Manila Overview
Manila Overview
8
Intel NetApp RedHat
Manila Share and Access APIs
Operation CLI Command Description
Create manila create Create a Manila share of specified size; optional name, availability zone, share type, share network, source snapshot
Delete manila delete Delete an existing Manila share; the manila force-delete command may be required if the Manila share is in an error state
Edit manila metadata Set or unset metadata on a Manila share
List manila list List all Manila shares
Show manila show Show details about a Manila share
Operation CLI Command Description
Allow manila access-allow Allow access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).
Deny manila access-deny Deny access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name).
List manila access-list List all Manila share access rules
9
Intel NetApp RedHat
Manila & Sahara
NetApp driver enabled*
10
Intel NetApp RedHat
The Goal for Sahara and Manila
Integration
To support as many as storage backends and protocols in Sahara as possible
11
Intel NetApp RedHat
Sahara Data Processing Model in Kilo*
Host
Virtual Cluster
VM1 VM2
Computing Task
HDFS
Computing Task
HDFS
PATTERN 1:
Internal HDFS in the same node
Host
Virtual Cluster
VM1 VM2
Computing Task HDFS
PATTERN 2:
Internal HDFS in different nodes
Host
Virtual
Cluster
VM1
Computing Task
Swift*
PATTERN 3:
Swift*
Host
12Intel NetApp RedHat
Compute and data reside
together in the same instance
in your Hadoop cluster.
Compute and data reside in
different instances. This is an
elastic way to manage
Hadoop clusters.
In order to persist data,
Sahara supports Swift to
stream the data directly.
Sahara Data Processing Model in Liberty* and the future
PATTERN 4:
External HDFS via Manila*
PATTERN 5:
Local Storage with Diverse
Storage Backend in Manila
PATTERN 6:
NFS
Host
Virtual
Cluster
VM1
Computing Task
Host
Manila Service
HDFS Driver
HDFS
Host
Virtual
Cluster
VM1
Computing Task
Host
Manila Service
NFS Driver
(Extensible)
GlusterFS
Local Volume
Host
Virtual
Cluster
VM1
Computing Task
NFS
Host
NetApp* Hadoop
NFS Connector
Manila Service
NFS Driver
This feature will be implemented in Mitaka
13Intel NetApp RedHat
Sahara can support external
HDFS by using the HDFS
driver in Manila.
Use local storage in Hadoop
and remote mount any type of
file storage in Manila.
NetApp Hadoop NFS
Connector can bring the NFS
capability into Hadoop.
Manila HDFS Driver
Use Manila HDFS Driver as external storage in Sahara
14Intel NetApp RedHat
Data Node Data Node Data Node
Name Node
Manila*
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
HDFS Driver
Use Case: Manila HDFS Driver
Use Case
● Use external HDFS either in the same node w/
compute service or in a physical cluster
Rationales For Use
● Use Manila HDFS driver to connect with HDFS
● Manila would help to create HDFS share
The Advantages
● Use existing HDFS cluster
● Centralized managing HDFS via Manila
Limitations
● Only support non-secured HDFS due to account
management issue between OpenStack and
Hadoop
Reference: https://blueprints.launchpad.net/manila/+spec/hdfs-driver
Tenant A
Step1
Step2
Step3
User A
User A User B
HDFS HDFS HDFS
15
Intel NetApp RedHat
Enable HDFS Driver in Manila
Step 1: Set up Manila configuration
• /etc/manila/manila.conf
• Make sure the login username and
password are correct
• Manila service needs to use the
user to login HDFS and create the
share folder by individual user
Step 2: Restart Manila Service
Reference: http://docs.openstack.org/developer/manila/devref/hdfs_native_driver.html
16
share_driver =
manila.share.drivers.hdfs.hdfs_native.HDFSNativeShareDriver
hdfs_namenode_ip = the IP address of the HDFS namenode. Only
single
namenode is supported now.
hdfs_namenode_port = the port of the HDFS namenode service
hdfs_ssh_port = HDFS namenode SSH port
hdfs_ssh_name = HDFS namenode SSH login name
hdfs_ssh_pw = HDFS namenode SSH login password, this parameter
is not necessary, if the following hdfs_ssh_private_key is configured
hdfs_ssh_private_key = Path to the HDFS namenode private key to ssh
login
…
manila.conf example
Intel NetApp RedHat
Add external HDFS as a Data Source in Sahara
• Make the user account - “hdfs” has been set up in HDFS side
• Sahara will use “hdfs” user to access external HDFS by default. You can
still set up your own user account in Sahara as well.
• Add external HDFS Location as a data source in Sahara
Limitation
No need for user account setup since currently it can only support non-
secured HDFS
17Intel NetApp RedHat
NFS Share Mounting
Binary storage and input / output data from Manila-provisioned NFS shares
18Intel NetApp RedHat
The Feature
• Mount Manila NFS shares to:
• All nodes in cluster
• Specific node groups (NN, etc.)
• Currently NFS-only
• Extensible to other share types
• API (see right)
• Path and access defaults shown
• Only id field needed
shares: {[
{
“id”: “uuid”,
“path”: “/mnt/uuid”,
“access_level”: “rw”
}
]}
19Intel NetApp RedHat
Use Case: Binary Data Storage
• “Job binaries”: *.jar, *.pig, etc.
•Comparatively small size
•Initial location irrelevant to perf
• Previous storage options in Sahara
•Swift (still available)
•Sahara DB (as blobs in SQL table)
• Rationales for NFS storage
•Version control directly on storage FS
•Long-term storage for use by transient clusters
20Intel NetApp RedHat
Gluster Node Gluster Node Gluster Node
Manila*
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4
Tenant B
VM5 VM6
Any Drivers
Use Case: Input / Output Data
Previous options in Sahara
● Cluster-internal HDFS
● External HDFS
● Swift
Rationales for use
● Standard FS access to data
● Convenient in many cases
Data copy necessary
● Similar to built-in hadoop fs -put operation
● Irrelevant in heavily reduced output or small
input case
● In large input case, network transfer is a
consideration
Reference: https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data-
source
Tenant A
LocalLocal LocalLocalLocalLocal
Step1
Gluster-Volume Gluster-Volume Gluster-Volume
Use GlusterFS as an example
Step2
Step3
21
Intel NetApp RedHat
Workflow: NFS Binary Storage and Input Data
1. Create manila NFS share
2. Place binary file on share at /absolute/path/to/binary.jar
3. Create sahara job binary object with path reference
manila://share_uuid/absolute/path/to/binary.jar
4. Utilize job binary in job template (per normal)
5. Create sahara data source with path reference
manila://share_uuid/absolute/path/to/input_dir
6. Run job from template using data source
22Intel NetApp RedHat
Automatic Mounting
• API field necessary to mount for non-EDP users
• Sahara’s EDP API mounts needed shares to a long-
standing cluster when a job references any data source or
binary on that share
• Uses defaults for permissions: rw
and path: /mnt/share_uuid/
23Intel NetApp RedHat
Automatic Mounting: Under the Hood
Framework Job Binaries Data Sources
All (Universal flow, per
cluster node)
Check to ensure required shares are mounted. If not:
1) Install nfs-common (Debian*) or nfs-utils (Red Hat) if not present
2) Get remote path for share UUID from Manila
3) Manila: access-allow for each required ip in cluster (if access does not exist)
4) mount -t nfs %(access_arg)s %(remote_path)s %(local_path)s
All (Universal flow) Translate manila://uuid/absolute/path to
/local_path/absolute/path
Translate manila://uuid/absolute/path to
file:///local_path/absolute/path
Hadoop (w/ Oozie) hadoop fs -copy-from-local into workflow
directory; referenced as filesystem paths in
workflow
Use file URL in Oozie workflow document (as
named job parameter or positional argument)
Spark Referenced by local filesystem path in spark-
submit call
Use file URL in spark-submit call (as positional
argument)
Storm Referenced as filesystem paths in storm jar
call
Use file URL in storm jar call (as positional
argument)
24Intel NetApp RedHat
Screenshots
25Intel NetApp RedHat
26
Intel NetApp RedHat
27
Intel NetApp RedHat
28
Intel NetApp RedHat
29
Intel NetApp RedHat
NetApp Hadoop NFS Connector
Future Proposal: Use NetApp Hadoop NFS
Connector in Sahara
30Intel NetApp RedHat
31
NetApp NFS Connector - Architecture Overview
● NFS Client written in Java
● Implements the Hadoop filesystem API
● No changes to Hadoop framework
● No changes to user programs
● Eliminates copying data into HDFS
● Optimized performance for NFS access
Intel NetApp RedHat
NFS Node NFS Node NFS Node
Manila
Share
Compute2Compute1 Compute3
VM1 VM2 VM3 VM4 VM5 VM6
NFS Driver
Sahara + Manila + NetApp NFS Connector
How to use
1. Use Manila to expose the NFS share
2. NetApp Hadoop NFS Connector as
“interface” to shared data
The Advantages
● NFS is one of the most common storage
protocols used in IT
● A direct way to communicate and process data
instead of using HDFS
Reference: https://blueprints.launchpad.net/sahara/+spec/nfs-as-a-data-
source
NetApp
NFS
Driver
NetApp
NFS
Driver
NetApp
NFS
Driver
NetApp
NFS
Driver
NetApp
NFS
Driver
NetApp
NFS
Driver
Step1
NFS Folder NFS Folder NFS Folder
Step2
Step3
32
Intel NetApp RedHat
Tenant BTenant A
Use Case
● NFS protocol to access data for Hadoop
33
● Deployment Choices
○ NFS(v3)
○ HDFS + NFS
● Open Source
● Snapshot, Flexclone
Snapmirror, and
Manila Disaster
Recovery (Mitaka)
Intel NetApp RedHat
NetApp NFS
Connector
NetApp Hadoop NFS Plugin
Use NetApp NFS Connector to run Hadoop on your existing data
• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-
hostname>:2049/tera/in /tera/out
• $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server-
hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out
Reference:
1. http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx
2. https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
34Intel NetApp RedHat
Summary
● The choices:
a) Manila HDFS Driver
b) Manila NFS Share Mount
https://www.netapp.com/us/media/tr-4464.pdf
a) NetApp NFS Connector for Hadoop
https://github.com/NetApp/NetApp-Hadoop-NFS-Connector
35
Intel NetApp RedHat
Sahara and Manila:
Access the Big Data Oasis
36
http://netapp.github.io
For more information:
Participating in the Intel Passport Program?
37
Are you playing? Be sure to get your Passport
Stamp for attending this session! See me or my
helper in the back at the end!
Not Playing yet? What are you waiting for? See
me or my helper in the back at the end and we can
get you started!
Don’t forget to return your stamped passport to the
Intel Booth #H3 to enter our raffle drawing! 3
Stamps = 1 Raffle Ticket
Intel NetApp RedHat
THANK YOU!
38Intel NetApp RedHat

Contenu connexe

Tendances

Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
DataWorks Summit
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
DataWorks Summit
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
DataWorks Summit
 

Tendances (20)

Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
20150716 introduction to apache spark v3
20150716 introduction to apache spark v3 20150716 introduction to apache spark v3
20150716 introduction to apache spark v3
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...Running secured Spark job in Kubernetes compute cluster and integrating with ...
Running secured Spark job in Kubernetes compute cluster and integrating with ...
 
Apache Geode Meetup, London
Apache Geode Meetup, LondonApache Geode Meetup, London
Apache Geode Meetup, London
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
 
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep LearningApache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
Apache Spark 2.4 Bridges the Gap Between Big Data and Deep Learning
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Running Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale PlatformRunning Hadoop as Service in AltiScale Platform
Running Hadoop as Service in AltiScale Platform
 
Hive on spark berlin buzzwords
Hive on spark berlin buzzwordsHive on spark berlin buzzwords
Hive on spark berlin buzzwords
 
What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014What's new in Hadoop Yarn- Dec 2014
What's new in Hadoop Yarn- Dec 2014
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenariosDeep Dive - Usage of on premises data gateway for hybrid integration scenarios
Deep Dive - Usage of on premises data gateway for hybrid integration scenarios
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
Apache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architecturesApache Flink & Kudu: a connector to develop Kappa architectures
Apache Flink & Kudu: a connector to develop Kappa architectures
 

Similaire à 20151027 sahara + manila final

2 Hadoop 1.x presentation in understading .pptx
2 Hadoop 1.x presentation in understading .pptx2 Hadoop 1.x presentation in understading .pptx
2 Hadoop 1.x presentation in understading .pptx
Kishanhari3
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
Amrut Patil
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
Timothy Spann
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
DataWorks Summit
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
DataWorks Summit
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
saipriyacoool
 

Similaire à 20151027 sahara + manila final (20)

2 Hadoop 1.x presentation in understading .pptx
2 Hadoop 1.x presentation in understading .pptx2 Hadoop 1.x presentation in understading .pptx
2 Hadoop 1.x presentation in understading .pptx
 
High availability networking openstack
High availability networking   openstackHigh availability networking   openstack
High availability networking openstack
 
Ocpeu14
Ocpeu14Ocpeu14
Ocpeu14
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Manila-An Update from Liberty
Manila-An Update from LibertyManila-An Update from Liberty
Manila-An Update from Liberty
 
Session 01 - Into to Hadoop
Session 01 - Into to HadoopSession 01 - Into to Hadoop
Session 01 - Into to Hadoop
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
Hadoop_Introduction_pptx.pptx
Hadoop_Introduction_pptx.pptxHadoop_Introduction_pptx.pptx
Hadoop_Introduction_pptx.pptx
 
Update on IRATI technical work after month 6
Update on IRATI technical work after month 6Update on IRATI technical work after month 6
Update on IRATI technical work after month 6
 
December 2013 HUG: InfiniDB for Hadoop
December 2013 HUG: InfiniDB for HadoopDecember 2013 HUG: InfiniDB for Hadoop
December 2013 HUG: InfiniDB for Hadoop
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Red Hat Enterprise Linux and NFS by syedmshaaf
Red Hat Enterprise Linux and NFS by syedmshaafRed Hat Enterprise Linux and NFS by syedmshaaf
Red Hat Enterprise Linux and NFS by syedmshaaf
 
Lessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloudLessons learned processing 70 billion data points a day using the hybrid cloud
Lessons learned processing 70 billion data points a day using the hybrid cloud
 
Hadoop ppt on the basics and architecture
Hadoop ppt on the basics and architectureHadoop ppt on the basics and architecture
Hadoop ppt on the basics and architecture
 
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenHadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women
 
Apache drill
Apache drillApache drill
Apache drill
 

Dernier

%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Dernier (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 

20151027 sahara + manila final

  • 1. MANILA* AND SAHARA*: CROSSING THE DESERT TO THE BIG DATA OASIS Ethan Gafford, Red Hat Jeff Applewhite, NetApp Malini Bhandaru, Intel covering for Weiting Chen
  • 2. AGENDA • Introduction • Sahara Overview • Manila Overview • The goal for Sahara and Manila integration • The approaches •Manila HDFS Driver •Manila NFS Share Mount •Manila + NetApp NFS Connector for Hadoop • Conclusion • Q&A 2Intel NetApp RedHat
  • 3. Sahara: The Problem Hadoop* (and Spark*, Storm*…) clusters are difficult to configure Commodity hardware is cheap but requires frequent (costly) maintenance Reliable hardware is expensive, and a fixed-size cluster will cause contention Demand for data processing varies over time within an organization Baremetal clusters go down, and can be a single point of failure Hadoop dev is very difficult without a real cluster TL;DR: Data processing clusters are harder to provision and maintain than they should be, and it hurts. 3 Intel NetApp RedHat
  • 4. Sahara: The Solution Put it in a cloud! Then have easy-to-use, standardized interfaces: ● To create clusters (reliably and repeatedly) ● To scale clusters ● To run data processing jobs ● On any popular data processing framework ● With sensible defaults that just work ● And sophisticated configuration management for expert users That's OpenStack* Sahara. 4 Intel NetApp RedHat
  • 5. Sahara: The API 5 Intel NetApp RedHat
  • 9. Manila Share and Access APIs Operation CLI Command Description Create manila create Create a Manila share of specified size; optional name, availability zone, share type, share network, source snapshot Delete manila delete Delete an existing Manila share; the manila force-delete command may be required if the Manila share is in an error state Edit manila metadata Set or unset metadata on a Manila share List manila list List all Manila shares Show manila show Show details about a Manila share Operation CLI Command Description Allow manila access-allow Allow access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name). Deny manila access-deny Deny access to the specified share for the specified access type and value (IP address or IP network address in CIDR notation or Windows user name). List manila access-list List all Manila share access rules 9 Intel NetApp RedHat
  • 10. Manila & Sahara NetApp driver enabled* 10 Intel NetApp RedHat
  • 11. The Goal for Sahara and Manila Integration To support as many as storage backends and protocols in Sahara as possible 11 Intel NetApp RedHat
  • 12. Sahara Data Processing Model in Kilo* Host Virtual Cluster VM1 VM2 Computing Task HDFS Computing Task HDFS PATTERN 1: Internal HDFS in the same node Host Virtual Cluster VM1 VM2 Computing Task HDFS PATTERN 2: Internal HDFS in different nodes Host Virtual Cluster VM1 Computing Task Swift* PATTERN 3: Swift* Host 12Intel NetApp RedHat Compute and data reside together in the same instance in your Hadoop cluster. Compute and data reside in different instances. This is an elastic way to manage Hadoop clusters. In order to persist data, Sahara supports Swift to stream the data directly.
  • 13. Sahara Data Processing Model in Liberty* and the future PATTERN 4: External HDFS via Manila* PATTERN 5: Local Storage with Diverse Storage Backend in Manila PATTERN 6: NFS Host Virtual Cluster VM1 Computing Task Host Manila Service HDFS Driver HDFS Host Virtual Cluster VM1 Computing Task Host Manila Service NFS Driver (Extensible) GlusterFS Local Volume Host Virtual Cluster VM1 Computing Task NFS Host NetApp* Hadoop NFS Connector Manila Service NFS Driver This feature will be implemented in Mitaka 13Intel NetApp RedHat Sahara can support external HDFS by using the HDFS driver in Manila. Use local storage in Hadoop and remote mount any type of file storage in Manila. NetApp Hadoop NFS Connector can bring the NFS capability into Hadoop.
  • 14. Manila HDFS Driver Use Manila HDFS Driver as external storage in Sahara 14Intel NetApp RedHat
  • 15. Data Node Data Node Data Node Name Node Manila* Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 Tenant B VM5 VM6 HDFS Driver Use Case: Manila HDFS Driver Use Case ● Use external HDFS either in the same node w/ compute service or in a physical cluster Rationales For Use ● Use Manila HDFS driver to connect with HDFS ● Manila would help to create HDFS share The Advantages ● Use existing HDFS cluster ● Centralized managing HDFS via Manila Limitations ● Only support non-secured HDFS due to account management issue between OpenStack and Hadoop Reference: https://blueprints.launchpad.net/manila/+spec/hdfs-driver Tenant A Step1 Step2 Step3 User A User A User B HDFS HDFS HDFS 15 Intel NetApp RedHat
  • 16. Enable HDFS Driver in Manila Step 1: Set up Manila configuration • /etc/manila/manila.conf • Make sure the login username and password are correct • Manila service needs to use the user to login HDFS and create the share folder by individual user Step 2: Restart Manila Service Reference: http://docs.openstack.org/developer/manila/devref/hdfs_native_driver.html 16 share_driver = manila.share.drivers.hdfs.hdfs_native.HDFSNativeShareDriver hdfs_namenode_ip = the IP address of the HDFS namenode. Only single namenode is supported now. hdfs_namenode_port = the port of the HDFS namenode service hdfs_ssh_port = HDFS namenode SSH port hdfs_ssh_name = HDFS namenode SSH login name hdfs_ssh_pw = HDFS namenode SSH login password, this parameter is not necessary, if the following hdfs_ssh_private_key is configured hdfs_ssh_private_key = Path to the HDFS namenode private key to ssh login … manila.conf example Intel NetApp RedHat
  • 17. Add external HDFS as a Data Source in Sahara • Make the user account - “hdfs” has been set up in HDFS side • Sahara will use “hdfs” user to access external HDFS by default. You can still set up your own user account in Sahara as well. • Add external HDFS Location as a data source in Sahara Limitation No need for user account setup since currently it can only support non- secured HDFS 17Intel NetApp RedHat
  • 18. NFS Share Mounting Binary storage and input / output data from Manila-provisioned NFS shares 18Intel NetApp RedHat
  • 19. The Feature • Mount Manila NFS shares to: • All nodes in cluster • Specific node groups (NN, etc.) • Currently NFS-only • Extensible to other share types • API (see right) • Path and access defaults shown • Only id field needed shares: {[ { “id”: “uuid”, “path”: “/mnt/uuid”, “access_level”: “rw” } ]} 19Intel NetApp RedHat
  • 20. Use Case: Binary Data Storage • “Job binaries”: *.jar, *.pig, etc. •Comparatively small size •Initial location irrelevant to perf • Previous storage options in Sahara •Swift (still available) •Sahara DB (as blobs in SQL table) • Rationales for NFS storage •Version control directly on storage FS •Long-term storage for use by transient clusters 20Intel NetApp RedHat
  • 21. Gluster Node Gluster Node Gluster Node Manila* Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 Tenant B VM5 VM6 Any Drivers Use Case: Input / Output Data Previous options in Sahara ● Cluster-internal HDFS ● External HDFS ● Swift Rationales for use ● Standard FS access to data ● Convenient in many cases Data copy necessary ● Similar to built-in hadoop fs -put operation ● Irrelevant in heavily reduced output or small input case ● In large input case, network transfer is a consideration Reference: https://blueprints.launchpad.net/sahara/+spec/manila-as-a-data- source Tenant A LocalLocal LocalLocalLocalLocal Step1 Gluster-Volume Gluster-Volume Gluster-Volume Use GlusterFS as an example Step2 Step3 21 Intel NetApp RedHat
  • 22. Workflow: NFS Binary Storage and Input Data 1. Create manila NFS share 2. Place binary file on share at /absolute/path/to/binary.jar 3. Create sahara job binary object with path reference manila://share_uuid/absolute/path/to/binary.jar 4. Utilize job binary in job template (per normal) 5. Create sahara data source with path reference manila://share_uuid/absolute/path/to/input_dir 6. Run job from template using data source 22Intel NetApp RedHat
  • 23. Automatic Mounting • API field necessary to mount for non-EDP users • Sahara’s EDP API mounts needed shares to a long- standing cluster when a job references any data source or binary on that share • Uses defaults for permissions: rw and path: /mnt/share_uuid/ 23Intel NetApp RedHat
  • 24. Automatic Mounting: Under the Hood Framework Job Binaries Data Sources All (Universal flow, per cluster node) Check to ensure required shares are mounted. If not: 1) Install nfs-common (Debian*) or nfs-utils (Red Hat) if not present 2) Get remote path for share UUID from Manila 3) Manila: access-allow for each required ip in cluster (if access does not exist) 4) mount -t nfs %(access_arg)s %(remote_path)s %(local_path)s All (Universal flow) Translate manila://uuid/absolute/path to /local_path/absolute/path Translate manila://uuid/absolute/path to file:///local_path/absolute/path Hadoop (w/ Oozie) hadoop fs -copy-from-local into workflow directory; referenced as filesystem paths in workflow Use file URL in Oozie workflow document (as named job parameter or positional argument) Spark Referenced by local filesystem path in spark- submit call Use file URL in spark-submit call (as positional argument) Storm Referenced as filesystem paths in storm jar call Use file URL in storm jar call (as positional argument) 24Intel NetApp RedHat
  • 30. NetApp Hadoop NFS Connector Future Proposal: Use NetApp Hadoop NFS Connector in Sahara 30Intel NetApp RedHat
  • 31. 31 NetApp NFS Connector - Architecture Overview ● NFS Client written in Java ● Implements the Hadoop filesystem API ● No changes to Hadoop framework ● No changes to user programs ● Eliminates copying data into HDFS ● Optimized performance for NFS access Intel NetApp RedHat
  • 32. NFS Node NFS Node NFS Node Manila Share Compute2Compute1 Compute3 VM1 VM2 VM3 VM4 VM5 VM6 NFS Driver Sahara + Manila + NetApp NFS Connector How to use 1. Use Manila to expose the NFS share 2. NetApp Hadoop NFS Connector as “interface” to shared data The Advantages ● NFS is one of the most common storage protocols used in IT ● A direct way to communicate and process data instead of using HDFS Reference: https://blueprints.launchpad.net/sahara/+spec/nfs-as-a-data- source NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver NetApp NFS Driver Step1 NFS Folder NFS Folder NFS Folder Step2 Step3 32 Intel NetApp RedHat Tenant BTenant A Use Case ● NFS protocol to access data for Hadoop
  • 33. 33 ● Deployment Choices ○ NFS(v3) ○ HDFS + NFS ● Open Source ● Snapshot, Flexclone Snapmirror, and Manila Disaster Recovery (Mitaka) Intel NetApp RedHat NetApp NFS Connector
  • 34. NetApp Hadoop NFS Plugin Use NetApp NFS Connector to run Hadoop on your existing data • $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server- hostname>:2049/tera/in /tera/out • $ hadoop jar <path-to-examples> jar terasort nfs://<nfs-server- hostname>:2049/tera/in nfs://<nfs-server-hostname>:2049/tera/out Reference: 1. http://www.netapp.com/us/solutions/big-data/nfs-connector-hadoop.aspx 2. https://github.com/NetApp/NetApp-Hadoop-NFS-Connector 34Intel NetApp RedHat
  • 35. Summary ● The choices: a) Manila HDFS Driver b) Manila NFS Share Mount https://www.netapp.com/us/media/tr-4464.pdf a) NetApp NFS Connector for Hadoop https://github.com/NetApp/NetApp-Hadoop-NFS-Connector 35 Intel NetApp RedHat Sahara and Manila: Access the Big Data Oasis
  • 37. Participating in the Intel Passport Program? 37 Are you playing? Be sure to get your Passport Stamp for attending this session! See me or my helper in the back at the end! Not Playing yet? What are you waiting for? See me or my helper in the back at the end and we can get you started! Don’t forget to return your stamped passport to the Intel Booth #H3 to enter our raffle drawing! 3 Stamps = 1 Raffle Ticket Intel NetApp RedHat