Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Leveraging docker for hadoop build automation and big data stack provisioning

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité

Consultez-les par la suite

1 sur 63 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Leveraging docker for hadoop build automation and big data stack provisioning (20)

Publicité

Plus par Evans Ye (17)

Plus récents (20)

Publicité

Leveraging docker for hadoop build automation and big data stack provisioning

  1. 1. LEVERAGING DOCKER FOR HADOOP BUILD AUTOMATION AND 
 BIG DATA STACK PROVISIONING Evans Ye, Sr. Software Engineer DataWorks Summit San Jose 2017
  2. 2. Who am I • Tech Lead @ APAC Data Team, Y! Taiwan • Building data products for E-Commerce business • PMC chair of Apache Bigtop, ASF member 2
  3. 3. Outline • Quick Intro to Apache Bigtop • Docker for Bigtop Packaging • Docker for Bigtop Provisioner • Docker for Bigtop Sandbox • Releases 3
  4. 4. QUICK INTRO TO 
 APACHE BIGTOP 4
  5. 5. Linux Distributions 5
  6. 6. Hadoop Distributions 6
  7. 7. But there're some other great Hadoop ecosystem components.. 7
  8. 8. How do I add patches? 8
  9. 9. 9
  10. 10. From source code to packages Bigtop
 Packaging 10
  11. 11. Supported components 11
  12. 12. Bigtop feature set Packaging Testing Deployment Virtualization for you to easily build your own Big Data Stack 12
  13. 13. Community stats • 94 total contributors • Spark: 1093, Hadoop: 99, HBase: 126, Hive:115 • 5 years since 2012 • 30 Hadoop ecosystem components packaged • 5 Linux Distro., 2 archs supported 13
  14. 14. DOCKER FOR 
 BIGTOP PACKAGING 14
  15. 15. Preparing build environment 15
  16. 16. Preparing build environment …
 Seriously ? 16
  17. 17. Bigtop Toolchain • Puppet recipes to install required libraries, build tools • To prepare a build environment: • Prerequisite : ▪ Java git clone https://github.com/apache/bigtop.git cd bigtop ./bigtop_toolchain/bin/puppetize.sh ./gradlew toolchain 17
  18. 18. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave 18
  19. 19. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 19
  20. 20. CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain Bigtop Toolchain 20
  21. 21. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 21
  22. 22. Dockerlized CI infrastructure CentOS slave Fedora slave Ubuntu slave Debian slave OpenSuSE slave • Immutable env • Fault tolerance 22
  23. 23. • Execute shell • Bigtop CI Setup Guide How to build packages # OS=debian-8 # COMPONENT=hadoop docker run -u jenkins --rm -v `pwd`:/bigtop --workdir /bigtop bigtop/slaves:trunk-$OS bash -l -c "./gradlew allclean $COMPONENT-pkg" 23
  24. 24. Bigtop packages on master https://ci.bigtop.apache.org/view/Packages/job/Bigtop-trunk-packages/ 24
  25. 25. • Example: How to port Bigtop Distribution to PPC64LE? • Prepare PPC64LE docker base image • Apply Bigtop Toolchain on PPC64LE docker image • Build Bigtop packages on PPC64LE slaves image • 2016: Ported 22 out of 24 Bigtop components in 2 weeks, with only 5 patches • Credit: Amir Sanjar, IBM Extremely friendly for porting 25
  26. 26. Bigtop early mission accomplished Leveraged by app providers… 26
  27. 27. Get out from the Apache dome 27
  28. 28. New focus and target user • Data engineers vs Distro. builders • Solution diversity: ▪ Streaming: Flink, Apex ▪ In-memory cache: Alluxio, Ignite ▪ User/developer tools: ▪ Bigtop Provisioner ▪ Bigtop Sandbox • Big data stack references • Machine learning, deep learning components 28
  29. 29. DOCKER FOR 
 BIGTOP PROVISIONER 29
  30. 30. Bigtop Provisioner • A tool to demonstrate full life cycle of Bigtop Packaging TestingDeploymentVirtualization Create resources Run Bigtop Puppet Run Bigtop Tests Bigtop Provisioner 30
  31. 31. • We use Vagrant as an abstraction layer to support different kind of resource providers Vagrant Providers
  32. 32. One click Hadoop provisioning
 (Bigtop 1.0.0) bigtop/deploy image 
 on Docker hub ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 32 https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+Provisioner+User+Guide
  33. 33. Problems with Vagrant’s Docker Provider • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker VM • A bug for docker provider regarding provision keeps opening for 2 years ▪ Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • ^#?& slow 33
  34. 34. Replaced by docker-compose 
 (Bigtop 1.2.0) ./docker-hadoop.sh -c 3 puppet apply puppet apply puppet apply 34 bigtop/deploy image 
 on Docker hub
  35. 35. Advantages • No need to create customized image beforehand • Better compatibility with Docker’s native solutions • Clear, simple yaml file for orchestration settings • Supports new features such as overlay network • Leverage Swarm for multi-node cluster deployment • Fast —> better user experience 35
  36. 36. • Execute shell • Bigtop CI Setup Guide How to run Docker Provisioner # See bigtop/provisioner/docker/*.yaml CONFIG=YOUR_CUSTOM_CONF.yaml # provision ./gradlew -Pconfig=${CONFIG} -Pnum_instances=1 docker-provisioner # destroy provisioned cluster ./gradlew docker-provisioner-destroy 36
  37. 37. YOUR_CUSTOM_CONF.yaml example 37 docker: memory_limit: "4g" image: "bigtop/puppet:centos-7" repo: "http://bigtop-repos.s3.amazonaws.com/releases/1.2.0/ centos/7/x86_64" distro: centos components: [hdfs, yarn, mapreduce] enable_local_repo: false smoke_test_components: [hdfs, yarn, mapreduce]
  38. 38. 38 Visibility for deployments 38
  39. 39. Use cases • For application developers, cluster admins, users ▪ Run a Hadoop cluster to test your code on ▪ Try & test configurations before applying to Production ▪ Play around with Bigtop Big Data Stacks • For contributors ▪ Easy to test your packaging, deployment, testing code • For Distro. builders ▪ CI matrix —> patch upstream code made easier 39
  40. 40. DOCKER FOR BIGTOP SANDBOX 40
  41. 41. Introducing Bigtop Sandbox • Easy way to get started • Docker images that has Bigtop stacks installed and configured • Pseudo cluster up & running w/o installation • Command-line tool for you to build your own stack 41
  42. 42. Docker image layer Interface Customized big data stack Deployment & management tool Base image (OS) 42
  43. 43. Docker image layer Concrete implementation HDFS + YARN + Spark Bigtop Puppet bigtop/puppet:ubuntu-16.04 43
  44. 44. Building images Ubuntu 16.04 Bigtop Puppet HDFS + YARN + Spark + site.yaml $ puppet apply 44
  45. 45. site.yaml example 45 bigtop::hadoop_head_node: bigtop.example.com bigtop::bigtop_repo_uri: http://bigtop-repos.s3.amazonaws.com/ releases/1.2.0/debian/8/x86_64 hadoop::hadoop_storage_dirs: [/data/1, /data/2] hadoop_cluster_node::cluster_components: [hdfs, yarn, spark]
  46. 46. How to build • Or specify your custom conf: git clone https://github.com/apache/bigtop.git cd bigtop/docker/sandbox ./build.sh -a bigtop -o ubuntu-16.04 -c "hdfs, yarn, spark" ./build.sh-a bigtop -o ubuntu-16.04 -f custom_site.yaml -t dws2017 46
  47. 47. Running images HDFS + YARN + Spark $ puppet apply 47
  48. 48. How to run docker run --name sandbox -d -p 50070:50070 -p 8088:8088 evansye/sandbox:dws2017 docker logs -f sandbox docker exec sandbox spark-example SparkPi 48
  49. 49. 49
  50. 50. Bigtop Provisioner Bigtop Sandbox Scalable V X Portable X V Flexibility High Medium Speed > 2 mins > 15 secs Requires Network V X Port forwarding X V 50
  51. 51. Bigtop Provisioner Bigtop Sandbox Data engineers Multi-node 
 cluster testing Build/use sandboxes 
 for dev & test Ops Multi-node 
 cluster testing Single node 
 testing Contributors Test packages, puppet recipes,
 test cases Test packages, puppet recipes,
 test cases Distro. Builders Test packages, puppet recipes,
 test cases Provide Sandboxes 51
  52. 52. Integration test in CI/CD pipeline Unit Test Source code Compile Build Image Integra7on test with Sandbox Sandbox Service CD pipeline with Bigtop Sandbox Docker Registry Push Image Deploy FINISHED Data 52
  53. 53. Future • Production deployment using Sandbox images ▪ --net host or overlay network(SDN)? ▪ External volumes for edit logs, fsimages, etc ▪ Cluster orchestration ▪ Swarm, Kubernetes? 53
  54. 54. RELEASES 54
  55. 55. ▪ New components: ▪ Ambari 2.5.0 ▪ GPDB 5.0.0-alpha.0
 (Greenplum) Bigtop 1.2.0 Released April, 2017 ▪ Featured upgrade: ▪ Hadoop 2.7.3 ▪ Spark 2.1.0 ▪ Kafka 0.10.1.1 ▪ HBase 1.1.3 ▪ and more 55
  56. 56. • New features: ▪ Juju bigtop charms ▪ Bigtop Sandbox (alpha, recommended to try master) • Improvement: ▪ Bigtop Docker Provisioner made faster New features in Bigtop 1.2.0 56
  57. 57. Juju Cloud Weather Report http://bigtop.charm.qa/ 57
  58. 58. • Expected to be out late June • Hadoop 2.7.4 
 (Interested in docker container support back ported, but I'm not sure yet) • Mainly bug fixes: • Packages • Deployments • Sandbox Bigtop 1.2.1 up coming 58
  59. 59. • Machine Learning and Deep Learning integration • Support aarch 64 • Enhance support set in Bigtop Puppet (not all components covered) • Extend the CI matrix coverage to Bigtop Tests • Ambari Bigtop stack integration • Provide Big data stack references Road ahead towards 1.3.0 59
  60. 60. 60
  61. 61. • Submit your proposal, contribute Bigtop w/ funding! • Improvements, new features, build, test, CI, etc • CFP opened June 13, 2017
 CFP closed July 14, 2017 • https://www.odpi.org/community/bigtopgrantfund ODPi Apache Bigtop Test Drive Program 61
  62. 62. • Join mailing list, ask questions, suggest features, etc • Contribute (components, tutorials, docs) • Report bugs ▪ Home page: http://bigtop.apache.org/ ▪ mailing list: http://bigtop.apache.org/mail-lists.html ▪ Document: https://cwiki.apache.org/confluence/display/BIGTOP/Index ▪ Source code: https://github.com/apache/bigtop ▪ Packages: https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/ ▪ JIRA: https://issues.apache.org/jira/browse/BIGTOP Reference 62
  63. 63. 63 Questions?

×