Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

The Practice of Alluxio in JD.com

148 vues

Publié le

Beijing Meetup
06/22/2019
Baolong Mao, JD.com

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

The Practice of Alluxio in JD.com

  1. 1. www.jd.com .
  2. 2. JDHDFS JDAlluxio JDCeph JDJDK JDKernel • Alluxio PMC • Hadoop contributor About me
  3. 3. Contents A short Introduction Introduce how to build when you modify your Alluxio or hadoop Cache the job container log Using Alluxio accelerate JobHistory 10x performance improvement some of the features contributed by JD JD Contribution Expectation of Alluxio & Future plan Alluxio Future
  4. 4. • It is the world’s first virtual distributed storage system. • Alluxio unifies data at memory-speed. • Virtual Data Lake What is Alluxio
  5. 5. • Application interface • Apache Spark Presto Tensorflow • Apache Hbase • Apache Hive or Apache Flink • Storage interface • Amazon S3 Google Cloud Storage OpenStack Swift • GlusterFS HDFS(Various version) • IBM Cleversafe EMC ECS • Ceph NFS Alibaba OSS Alluxio is a bridge
  6. 6. • Powered by alluxio https://www.alluxio.io/powered-by-alluxio/ Today, Alluxio is deployed in production by hundreds of organizations with the largest deployment exceeding 1,500 nodes.
  7. 7. Alluxio is one of the fastest growing open source projects that has attracted more than 1000 contributors from over 300 institutions including Alibaba, Alluxio, Baidu, JD.COM,CMU, Google, IBM, Intel, N JU, Red Hat, Tencent, UC Berkeley, and Yahoo. • Active Open Source Comunity
  8. 8. Why build? How to build? XXAlluxio or XXHadoop • mvn install -Pdist,native -DskipTests=true -Dmaven.javadoc.skip=true - Drequire.snappy -Dsnappy.prefix=/data0/snappy/ -Dcontainer- executor.conf.dir=/etc/yarn-executor/ -Dtar • mvn -T 4C clean install -Phadoop-2 -Dhadoop.version=2.7.1 -DskipTests - Dlicense.skip=true -Dfindbugs.skip -Dmaven.javadoc.skip -Dcheckstyle.skip ; dev/scripts/generate-tarballs -ufs-modules=all release
  9. 9. • 22: 1 . : 9 91 22: 1 : -7. 0 . .1/:7 91 -7 -7 22: 1 0 . 2: 9-7$ 0 0 . 19- 2 91 22: 1 : -7. -7 1 9 2 / 0 . $ 9 2 / • Put alluxio client package into the jobhistory classpath. cp alluxio-core-client-hdfs-2.0.0-SNAPSHOT.jar hadoop-2.7.1/share/hadoop/hdfs/ How to let JobHistory use Alluxio
  10. 10. • Config Jobhistory Hdfs-site.xml <property> <name>fs.alluxio.impl</name> <value>alluxio.hadoop.FileSystem</value> </property> <property> <name>fs.alluxio-ft.impl</name> <value>alluxio.hadoop.FaultTolerantFileSystem</value> </property> <property> <name>fs.AbstractFileSystem.alluxio.impl</name> <value>alluxio.hadoop.AlluxioFileSystem</value> </property> How to let JobHistory use Alluxio yarn-site.xml <property> <name>yarn.nodemanager.remote-app-log-dir</name> <value>alluxio://hostname:19998/tmp/app-logs</value> </property>
  11. 11. JobHistory using Alluxio show
  12. 12. JobHistory using Alluxio show
  13. 13. Presto
  14. 14. Higher query throughput Consistent low query latency Eliminates network traffic Presto + Alluxio = better together
  15. 15. • Alluxio led to 10x performance improvement • 100+ nodes • More than 2.5 year. • When we use Alluxio for JDPresto, we make some changes and bring some good features • Pluggable • Fault-tolerant • Locality Alluxio can be online or updated at any time When Alluxio unable to access JDPresto can access HDFS directly. Reduce the remote read Presto on Alluxio
  16. 16. load once use every time ō AfterBefore Presto on Alluxio
  17. 17. Presto HDFS Alluxio Presto on Alluxio
  18. 18. Presto on Alluxio
  19. 19. Presto on Alluxio
  20. 20. Speed Contrast Presto on Alluxio
  21. 21. Review Alluxio Architecture
  22. 22. Watermark Evict Strategy Start apply for space check space load file from hdfs release space space enough End no space • Sync Evit Strategy • Async Evit Strategy Client apply for space High watermark load file from hdfs Start (async thread) End release space N Y
  23. 23. Alluxio Cache Consistency(1)
  24. 24. Alluxio Cache Consistency(2) Start is file traverse the path End exist in UFS file size are same modify time are same clean metadata N N Y Y Y Y Keep Alluxio & HDFS Consistency To ensure that dirty data is not read. There are three ways to trigger file consistency check. • RPC API • RESTful API • Alluxio Master startup Client request metadata by getFileId, getFileInfo, listStatus, etc Alluxio master will check file cache consistency calling reloadMetaData to trigger Alluxio to reload all metadata check file cache consistency while master start up
  25. 25. Alluxio UI
  26. 26. JD for Alluxio / - / - - / - - - - - - - - - - - / - /- - - - - A - - A
  27. 27. JD for Alluxio PMC 1 Contributor 6 PR 50 Merged PR 47 Merged Commit 218 Additions/Deletions +4150/-2251
  28. 28. Alluxio in JD
  29. 29. - HA, stability, High Performance, Confidence - Global Namespace - Server-Side API Translation - Monitorable & Measurable - Cutability (fs metamountTabledistributed cache) Core expectations for Alluxio
  30. 30. Alluxio Exploration • Exploring more application scenarios • Porting HDFS Authentication to Alluxio • HDFS RBF or Alluxio Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are going to port custom permissions on existing HDFS to Alluxio. We have tried to use HDFS router-based federation, but its performance does not meet our online requirements. We find that Alluxio also has forwarding capabilities and hopes that Alluxio will perform better.
  31. 31. 3 1 1.1 . 1 1.1 1 https://alluxio-community.slack.com

×