SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Big Data Computations
Using Elastic Data
Processing in
OpenStack Cloud
Sergey Lukjanov (Mirantis)
Alexander Ignatov (Mirantis)
Trevor McKay (Red Hat)
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interfaces.
• provision and operate Hadoop clusters
• schedule and operate Hadoop jobs
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Savanna
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Data Platform
• Intel Distribution
• Cloudera Distribution in blueprint
• Included into OpenStack distros:
• RDO - openstack.redhat.com
• Mirantis OpenStack - software.mirantis.com
Contributors
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Elastic Data Processing
• EDP - API for executing MapReduce jobs on
Hadoop clusters (similar to AWS EMR)
• Supported data sources: Swift, HDFS, Ceph
• Supported job types: Java actions,
MapReduce, MapReduce.Streaming, Pig, Hive
• Oozie for Hadoop jobs workflow management
• Supports both Hadoop 1 & 2
• Job executions on transient clusters
EDP Use Cases
• Simplified task executions. You don’t need to
know Hadoop!
• Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
• Utilization of free IaaS capacity for Hadoop tasks
EDP - Data Sources
Swift Sahara EDP
INPUT
OUTPUT
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
swift://some_container/INPUT
swift://some_container/OUTPUT
EDP - Job Binaries
Swift
Sahara DB
Sahara EDP
internal-db://script.pig
swift://some_container/mapreduce.jar
1. Pig, Hive scripts
2. Executable Jar files
3. Pluggable binaries and
libraries
EDP - Job Execution. Step 1
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
EDP - Job Execution. Step 2
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
JobTracker
Oozie
Hadoop
VM
Hadoop
VM
Hadoop
VM
EDP - Job Execution. Step 3
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
Execute
a
job
EDP - Job Execution. Step 4
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
EDP - Job Execution. Step 5
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
EDP - Job Execution. Step 6
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
Data Processing
OUTPUT
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
EDP - Job Execution. Step 7
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
Data Processing
OUTPUT
JobTracker
Oozie
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
EDP BigPetStore Demo
BigPetStore is now part of Apache BigTop
• Test/demo laboratory for all things Hadoop
• Actively developed with integration testing
• Generates and processes data of arbitrary size
• git clone git://git.apache.org/bigtop.git
• Filed under bigtop/bigtop-bigpetstore
EDP BigPetStore Demo
What are we going to do?
• Generate 1M records of pet supply purchases
• Clean the data (“dirty CSV”)
• Extract cumulative counts by state
• Demonstrates Sahara EDP objects
• Job Binaries
• Jobs (Java and Pig)
• Data Sources
EDP BigPetStore Sample Data
Generated Data (first job)
$ hadoop fs -cat bigpetstore/gen/part-r-00000 | more
BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food
BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food
Cleaned Data (second job)
$ hadoop fs -cat bigpetstore/clean/part-m-00000 | more
BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food
BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
EDP BigPetStore Sample Data
Summed Data For Products by State (3rd job)
$ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more
US-AK cat-food 24837
US-AK dog-food 24994
US-AK fuzzy-collar 25145
US-AK antelope-caller 25024
US-AZ cat-food 25106
US-AZ dog-food 25064
US-AZ leather-collar 24870
US-AZ snake-bite ointment 24960
What Next for EDP
Potential Areas for Development within EDP
• Pluggable Job Execution Model
• Allows Sahara to run jobs with additional execution engines
• Current Oozie offerings become one of multiple options
• Expand Capabilities via Oozie
• Support upload of user-written Oozie workflows
• Support for coordinated jobs
• Enhanced Usability
• Better Error Reporting
• User Experience (UI, CLI, API)
Please, send us your feedback! Ideas are always welcome
• #openstack-sahara on freenode
• openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Q&A
Thank you!

Contenu connexe

Tendances

20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on dockerWei Ting Chen
 
Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno EditionOpenStack Foundation
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...DataWorks Summit
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Saharaspinningmatt
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetupWei Ting Chen
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015Yousun Jeong
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)Nicolas Poggi
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Summit
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for HadoopHBaseCon
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Databricks
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...DataStax
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...Spark Summit
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezDataWorks Summit
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparktrihug
 

Tendances (20)

20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 

Similaire à Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideDanairat Thanabodithammachari
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideDanairat Thanabodithammachari
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudSoam Acharya
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDataWorks Summit
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleData Con LA
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Data Con LA
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Ferran Galí Reniu
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Vinoth Chandar
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdiMohit Jaggi
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku
 

Similaire à Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud (20)

Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdi
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 

Plus de Sergey Lukjanov

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStackSergey Lukjanov
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkSergey Lukjanov
 

Plus de Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Dernier

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Dernier (20)

Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

  • 1. Big Data Computations Using Elastic Data Processing in OpenStack Cloud Sergey Lukjanov (Mirantis) Alexander Ignatov (Mirantis) Trevor McKay (Red Hat)
  • 2. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 3. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  • 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  • 8. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  • 10. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 11. Elastic Data Processing • EDP - API for executing MapReduce jobs on Hadoop clusters (similar to AWS EMR) • Supported data sources: Swift, HDFS, Ceph • Supported job types: Java actions, MapReduce, MapReduce.Streaming, Pig, Hive • Oozie for Hadoop jobs workflow management • Supports both Hadoop 1 & 2 • Job executions on transient clusters
  • 12. EDP Use Cases • Simplified task executions. You don’t need to know Hadoop! • Bursty workload: ad-hoc queries requiring a significant resource only for short time period • Utilization of free IaaS capacity for Hadoop tasks
  • 13. EDP - Data Sources Swift Sahara EDP INPUT OUTPUT Hadoop VM Hadoop VM Hadoop VM Hadoop VM swift://some_container/INPUT swift://some_container/OUTPUT
  • 14. EDP - Job Binaries Swift Sahara DB Sahara EDP internal-db://script.pig swift://some_container/mapreduce.jar 1. Pig, Hive scripts 2. Executable Jar files 3. Pluggable binaries and libraries
  • 15. EDP - Job Execution. Step 1 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig
  • 16. EDP - Job Execution. Step 2 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig JobTracker Oozie Hadoop VM Hadoop VM Hadoop VM
  • 17. EDP - Job Execution. Step 3 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie Execute a job
  • 18. EDP - Job Execution. Step 4 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie
  • 19. EDP - Job Execution. Step 5 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 20. EDP - Job Execution. Step 6 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l Data Processing OUTPUT 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 21. EDP - Job Execution. Step 7 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials Data Processing OUTPUT JobTracker Oozie
  • 22. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 23. EDP BigPetStore Demo BigPetStore is now part of Apache BigTop • Test/demo laboratory for all things Hadoop • Actively developed with integration testing • Generates and processes data of arbitrary size • git clone git://git.apache.org/bigtop.git • Filed under bigtop/bigtop-bigpetstore
  • 24. EDP BigPetStore Demo What are we going to do? • Generate 1M records of pet supply purchases • Clean the data (“dirty CSV”) • Extract cumulative counts by state • Demonstrates Sahara EDP objects • Job Binaries • Jobs (Java and Pig) • Data Sources
  • 25. EDP BigPetStore Sample Data Generated Data (first job) $ hadoop fs -cat bigpetstore/gen/part-r-00000 | more BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food Cleaned Data (second job) $ hadoop fs -cat bigpetstore/clean/part-m-00000 | more BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
  • 26. EDP BigPetStore Sample Data Summed Data For Products by State (3rd job) $ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more US-AK cat-food 24837 US-AK dog-food 24994 US-AK fuzzy-collar 25145 US-AK antelope-caller 25024 US-AZ cat-food 25106 US-AZ dog-food 25064 US-AZ leather-collar 24870 US-AZ snake-bite ointment 24960
  • 27. What Next for EDP Potential Areas for Development within EDP • Pluggable Job Execution Model • Allows Sahara to run jobs with additional execution engines • Current Oozie offerings become one of multiple options • Expand Capabilities via Oozie • Support upload of user-written Oozie workflows • Support for coordinated jobs • Enhanced Usability • Better Error Reporting • User Experience (UI, CLI, API) Please, send us your feedback! Ideas are always welcome • #openstack-sahara on freenode • openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
  • 28. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  • 29. Q&A