SlideShare une entreprise Scribd logo
1  sur  33
Making sense of Apache Bigtop, ODPi and
why it all matters to Apache Apex
Roman Shaposhnik, rvs@apache.org,
@rhatr
Director of Open Source Strategy,
Pivotal Inc.
A slide deck build via “Apache Way”
• Bigtop community contributors
• Roman Shaposhnik
• Konstantin Boudnik
• Nate D'Amico
• Evans Ye & Darren Chen (Trend Micro)
What is Apache Bigtop?
• Apache Bigtop is to Hadoop what Debian is to Linux
• A 100% open, community driven distribution of bigdata
management platform based on Apache Hadoop
• A place where all communities around big data come
together
• The thing everybody (Pivotal, Cloudera, Hortonworks,
WANDisco, IBM, Amazon, TrendMicro) is building off of
• A cutting edge, quickly evolving distribution and a set
of tools
GNU Software Linux kernel
Hadoop Ecosystem
(Pig, Hive, Spark) Linux kernel
Hadoop
(HDFS + YARN + MR)
ODPi is a nonprofit organization committed to simplification &
standardization of the big data ecosystem with a common reference
specification called ODPi Core.
As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop®
and Big Data Technologies for the Enterprise.
February 2015 December 2015September 2015
What has ODPi done so far (1.0.1)?
• Runtime specification
• https://github.com/odpi/specs/blob/master/ODPi-Runtime.md
• Validation testsuite
• http://repo.odpi.org/ODPi/1.0/acceptance-tests/
• Reference implementation binaries
• http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
What are we working on?
• Operations specification
• https://github.com/odpi/specs/blob/master/ODPi-Operations.md
• ISV “ODPi compatible” policy
• Expanding ODPi core beyond Apache Hadoop & Ambari
• Hive
• ????
• How can you help?
• Share usecases
• Test against reference implementation
• Contribute to upstream ASF projects
What’s in is Bigtop?
• A set of binary packages
• just like CDH/PHD/HDP/ODPi/etc.
• Integration code
• Packaging code
• Deployment code
• Orchestration code
• Validation code
• Continuous Integration infrastructure
Integration/packaging
• Linux packages
• RPM, DEB
• RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu
• VirtualBox, VMWare, etc. VM images
• Challenge: Linux packaging is node-centric
• “smart” tarballs
• Docker or BOSH images
Integration testing based on iTest
• Clean-room provisioning
• these ain’t your gramp’s unit tests
• Versioned test artifacts
• JVM-base test artifacts
• Matching stacks of components and integration tests
• Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
Puppet 3.x deployment
• Master-less puppet
• $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node
• Cluster topology is kept in Hiera
bigtop::hadoop_head_node: "hadoopmaster.example.com"
hadoop::hadoop_storage_dirs:
- ”/mnt”
hadoop_cluster_node::cluster_components:
- yarn
- zookeeper
bigtop::bigtop_repo_uri:
"http://bigtop-
One click Bigtop provisioning
Who is this for?
• For Hadoop app developers, cluster admins, users
• Run a Hadoop cluster to test your code on
• Try & test configurations before applying to Production
• Play around with Bigtop Big Data Stack
• For contributors
• Easy to test your packaging, deployment, testing code
• For vendors
• CI out of the box —> patch upstream code made easier
Works great, but…
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker
hosting VM
• A bug for docker provider keep opening for almost
2y
• Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers
anyway
• Not all the docker options supported in Vagrantfile
• Does not support Docker Swarm
Docker Compose
Implementation
• Create docker containers:
• docker-compose scale bigtop=3
• Volumes:
• Bigtop Puppet configurations
• Bigtop Puppet code
• /etc/hosts
•Compatible with Docker Machine and Swarm
Docker Machine and Swarm
Juju orchestration
$ juju boostrap
$ juju deploy hadoop-processing
https://jujucharms.com/hadoop-
processing/
Juju orchestration
$ juju add-unit slave -n 2
Juju orchestration
$ juju action do namenode/0 smoke-test
$ juju action do resourcemanager/0
smoke-test
$ watch -n 0.5 juju action status
Early Mission Accomplished
Foundation for commercial Hadoop distros/services
Leveraged by app providers…
Blue prints for data engineering
• BigPetStore
• Data Generator
• Examples using tools in Hadoop ecosystem to process
data
• Build system and tests for integrating tools and multiple
JVM languages
• Started by Dr. Jay Vyas, prinicipal software engineer at
Red Hat, Inc.
Datamodel
Transaction Purchase Model
Lambda/Stream Architectures
HDFS + Zookeeper +
New focus and target end users
Data engineers vs distro
builders
Enhance
Operations/Deployment
Reference implementations
& tutorials
Data data data…
Smarter/Realistic test data
-bigpetstore
-bigtop-bazaar
-weather data gen
Tutorial/Learning Data sets
-githubarchive.org
-more tbd…
Thank You, Q&A

Contenu connexe

Tendances

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopDataWorks Summit
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseeNovance
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandeNovance
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju CharmsAltoros
 
Puppet at Spotify
Puppet at SpotifyPuppet at Spotify
Puppet at SpotifyPuppet
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Daniel Krook
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overviewroundman
 
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...OpenShift Origin
 
Build a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerBuild a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerK Rain Leander
 
Delve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsDelve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsLachlan Evenson
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Matt Ray
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamoneNovance
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltStack
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012Matt Ray
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
 
Containers and CloudStack
Containers and CloudStackContainers and CloudStack
Containers and CloudStackShapeBlue
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101Timothy Spann
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...OpenShift Origin
 
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Origin
 

Tendances (20)

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of Hadoop
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit Portland
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
 
Puppet at Spotify
Puppet at SpotifyPuppet at Spotify
Puppet at Spotify
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
 
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
 
Build a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerBuild a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-manager
 
Delve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsDelve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOps
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Containers and CloudStack
Containers and CloudStackContainers and CloudStack
Containers and CloudStack
 
kolla
kollakolla
kolla
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
 
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
 

Similaire à Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat OverviewMandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Mandi Walls
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
Neev Open Source Contributions
Neev Open Source ContributionsNeev Open Source Contributions
Neev Open Source ContributionsNeev Technologies
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013Docker, Inc.
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments buildacloud
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsPuppet
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...Ganesh Raju
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
 
Containerdays Intro to Habitat
Containerdays Intro to HabitatContainerdays Intro to Habitat
Containerdays Intro to HabitatMandi Walls
 

Similaire à Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat Overview
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Neev Open Source Contributions
Neev Open Source ContributionsNeev Open Source Contributions
Neev Open Source Contributions
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux Containers
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Containerdays Intro to Habitat
Containerdays Intro to HabitatContainerdays Intro to Habitat
Containerdays Intro to Habitat
 

Plus de Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 

Plus de Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 

Dernier

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 

Dernier (20)

(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 

Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

  • 1. Making sense of Apache Bigtop, ODPi and why it all matters to Apache Apex Roman Shaposhnik, rvs@apache.org, @rhatr Director of Open Source Strategy, Pivotal Inc.
  • 2. A slide deck build via “Apache Way” • Bigtop community contributors • Roman Shaposhnik • Konstantin Boudnik • Nate D'Amico • Evans Ye & Darren Chen (Trend Micro)
  • 3. What is Apache Bigtop? • Apache Bigtop is to Hadoop what Debian is to Linux • A 100% open, community driven distribution of bigdata management platform based on Apache Hadoop • A place where all communities around big data come together • The thing everybody (Pivotal, Cloudera, Hortonworks, WANDisco, IBM, Amazon, TrendMicro) is building off of • A cutting edge, quickly evolving distribution and a set of tools
  • 5. Hadoop Ecosystem (Pig, Hive, Spark) Linux kernel Hadoop (HDFS + YARN + MR)
  • 6. ODPi is a nonprofit organization committed to simplification & standardization of the big data ecosystem with a common reference specification called ODPi Core. As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop® and Big Data Technologies for the Enterprise.
  • 7. February 2015 December 2015September 2015
  • 8.
  • 9. What has ODPi done so far (1.0.1)? • Runtime specification • https://github.com/odpi/specs/blob/master/ODPi-Runtime.md • Validation testsuite • http://repo.odpi.org/ODPi/1.0/acceptance-tests/ • Reference implementation binaries • http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
  • 10. What are we working on? • Operations specification • https://github.com/odpi/specs/blob/master/ODPi-Operations.md • ISV “ODPi compatible” policy • Expanding ODPi core beyond Apache Hadoop & Ambari • Hive • ???? • How can you help? • Share usecases • Test against reference implementation • Contribute to upstream ASF projects
  • 11. What’s in is Bigtop? • A set of binary packages • just like CDH/PHD/HDP/ODPi/etc. • Integration code • Packaging code • Deployment code • Orchestration code • Validation code • Continuous Integration infrastructure
  • 12. Integration/packaging • Linux packages • RPM, DEB • RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu • VirtualBox, VMWare, etc. VM images • Challenge: Linux packaging is node-centric • “smart” tarballs • Docker or BOSH images
  • 13. Integration testing based on iTest • Clean-room provisioning • these ain’t your gramp’s unit tests • Versioned test artifacts • JVM-base test artifacts • Matching stacks of components and integration tests • Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
  • 14. Puppet 3.x deployment • Master-less puppet • $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node • Cluster topology is kept in Hiera bigtop::hadoop_head_node: "hadoopmaster.example.com" hadoop::hadoop_storage_dirs: - ”/mnt” hadoop_cluster_node::cluster_components: - yarn - zookeeper bigtop::bigtop_repo_uri: "http://bigtop-
  • 15. One click Bigtop provisioning
  • 16. Who is this for? • For Hadoop app developers, cluster admins, users • Run a Hadoop cluster to test your code on • Try & test configurations before applying to Production • Play around with Bigtop Big Data Stack • For contributors • Easy to test your packaging, deployment, testing code • For vendors • CI out of the box —> patch upstream code made easier
  • 17. Works great, but… • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker hosting VM • A bug for docker provider keep opening for almost 2y • Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • Does not support Docker Swarm
  • 19. Implementation • Create docker containers: • docker-compose scale bigtop=3 • Volumes: • Bigtop Puppet configurations • Bigtop Puppet code • /etc/hosts •Compatible with Docker Machine and Swarm
  • 21. Juju orchestration $ juju boostrap $ juju deploy hadoop-processing
  • 23. Juju orchestration $ juju add-unit slave -n 2
  • 24. Juju orchestration $ juju action do namenode/0 smoke-test $ juju action do resourcemanager/0 smoke-test $ watch -n 0.5 juju action status
  • 25. Early Mission Accomplished Foundation for commercial Hadoop distros/services Leveraged by app providers…
  • 26.
  • 27. Blue prints for data engineering • BigPetStore • Data Generator • Examples using tools in Hadoop ecosystem to process data • Build system and tests for integrating tools and multiple JVM languages • Started by Dr. Jay Vyas, prinicipal software engineer at Red Hat, Inc.
  • 31. New focus and target end users Data engineers vs distro builders Enhance Operations/Deployment Reference implementations & tutorials
  • 32. Data data data… Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen Tutorial/Learning Data sets -githubarchive.org -more tbd…