Roman Shaposhnik: Director of Open Source, Pivotal; Committer, Apache Hadoop; Founder, Apache Bigtop
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex.
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
1. Making sense of Apache Bigtop, ODPi and
why it all matters to Apache Apex
Roman Shaposhnik, rvs@apache.org,
@rhatr
Director of Open Source Strategy,
Pivotal Inc.
2. A slide deck build via “Apache Way”
• Bigtop community contributors
• Roman Shaposhnik
• Konstantin Boudnik
• Nate D'Amico
• Evans Ye & Darren Chen (Trend Micro)
3. What is Apache Bigtop?
• Apache Bigtop is to Hadoop what Debian is to Linux
• A 100% open, community driven distribution of bigdata
management platform based on Apache Hadoop
• A place where all communities around big data come
together
• The thing everybody (Pivotal, Cloudera, Hortonworks,
WANDisco, IBM, Amazon, TrendMicro) is building off of
• A cutting edge, quickly evolving distribution and a set
of tools
6. ODPi is a nonprofit organization committed to simplification &
standardization of the big data ecosystem with a common reference
specification called ODPi Core.
As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop®
and Big Data Technologies for the Enterprise.
9. What has ODPi done so far (1.0.1)?
• Runtime specification
• https://github.com/odpi/specs/blob/master/ODPi-Runtime.md
• Validation testsuite
• http://repo.odpi.org/ODPi/1.0/acceptance-tests/
• Reference implementation binaries
• http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
10. What are we working on?
• Operations specification
• https://github.com/odpi/specs/blob/master/ODPi-Operations.md
• ISV “ODPi compatible” policy
• Expanding ODPi core beyond Apache Hadoop & Ambari
• Hive
• ????
• How can you help?
• Share usecases
• Test against reference implementation
• Contribute to upstream ASF projects
11. What’s in is Bigtop?
• A set of binary packages
• just like CDH/PHD/HDP/ODPi/etc.
• Integration code
• Packaging code
• Deployment code
• Orchestration code
• Validation code
• Continuous Integration infrastructure
12. Integration/packaging
• Linux packages
• RPM, DEB
• RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu
• VirtualBox, VMWare, etc. VM images
• Challenge: Linux packaging is node-centric
• “smart” tarballs
• Docker or BOSH images
13. Integration testing based on iTest
• Clean-room provisioning
• these ain’t your gramp’s unit tests
• Versioned test artifacts
• JVM-base test artifacts
• Matching stacks of components and integration tests
• Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
14. Puppet 3.x deployment
• Master-less puppet
• $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node
• Cluster topology is kept in Hiera
bigtop::hadoop_head_node: "hadoopmaster.example.com"
hadoop::hadoop_storage_dirs:
- ”/mnt”
hadoop_cluster_node::cluster_components:
- yarn
- zookeeper
bigtop::bigtop_repo_uri:
"http://bigtop-
16. Who is this for?
• For Hadoop app developers, cluster admins, users
• Run a Hadoop cluster to test your code on
• Try & test configurations before applying to Production
• Play around with Bigtop Big Data Stack
• For contributors
• Easy to test your packaging, deployment, testing code
• For vendors
• CI out of the box —> patch upstream code made easier
17. Works great, but…
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker
hosting VM
• A bug for docker provider keep opening for almost
2y
• Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers
anyway
• Not all the docker options supported in Vagrantfile
• Does not support Docker Swarm
27. Blue prints for data engineering
• BigPetStore
• Data Generator
• Examples using tools in Hadoop ecosystem to process
data
• Build system and tests for integrating tools and multiple
JVM languages
• Started by Dr. Jay Vyas, prinicipal software engineer at
Red Hat, Inc.
31. New focus and target end users
Data engineers vs distro
builders
Enhance
Operations/Deployment
Reference implementations
& tutorials
32. Data data data…
Smarter/Realistic test data
-bigpetstore
-bigtop-bazaar
-weather data gen
Tutorial/Learning Data sets
-githubarchive.org
-more tbd…