SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
www.jd.com
.
JDHDFS
JDAlluxio JDCeph JDJDK
JDKernel
• Alluxio PMC
• Hadoop contributor
About me
Contents
A short Introduction
Introduce how to build when you modify your Alluxio or hadoop
Cache the job container log
Using Alluxio accelerate JobHistory
10x performance improvement
some of the features contributed by JD
JD Contribution
Expectation of Alluxio & Future plan
Alluxio Future
• It is the world’s first virtual distributed storage system.
• Alluxio unifies data at memory-speed.
• Virtual Data Lake
What is Alluxio
• Application interface
• Apache Spark Presto Tensorflow
• Apache Hbase
• Apache Hive or Apache Flink
• Storage interface
• Amazon S3 Google Cloud Storage OpenStack Swift
• GlusterFS HDFS(Various version)
• IBM Cleversafe EMC ECS
• Ceph NFS Alibaba OSS
Alluxio is a bridge
• Powered by alluxio
https://www.alluxio.io/powered-by-alluxio/
Today, Alluxio is deployed in production by hundreds of
organizations with the largest deployment exceeding 1,500 nodes.
Alluxio is one of the fastest growing open source projects that has
attracted more than 1000 contributors from over 300 institutions
including Alibaba, Alluxio, Baidu, JD.COM,CMU, Google, IBM, Intel, N
JU, Red Hat, Tencent, UC Berkeley, and Yahoo.
• Active Open Source Comunity
Why build? How to build?
XXAlluxio or XXHadoop
• mvn install -Pdist,native -DskipTests=true -Dmaven.javadoc.skip=true -
Drequire.snappy -Dsnappy.prefix=/data0/snappy/ -Dcontainer-
executor.conf.dir=/etc/yarn-executor/ -Dtar
• mvn -T 4C clean install -Phadoop-2 -Dhadoop.version=2.7.1 -DskipTests -
Dlicense.skip=true -Dfindbugs.skip -Dmaven.javadoc.skip -Dcheckstyle.skip ;
dev/scripts/generate-tarballs -ufs-modules=all release
•
22: 1 . : 9 91
22: 1 : -7. 0 . .1/:7 91 -7 -7 22: 1
0 . 2: 9-7$ 0 0 . 19- 2 91
22: 1 : -7. -7 1 9 2 / 0 . $ 9
2 /
• Put alluxio client package into the jobhistory classpath.
cp alluxio-core-client-hdfs-2.0.0-SNAPSHOT.jar hadoop-2.7.1/share/hadoop/hdfs/
How to let JobHistory use Alluxio
• Config Jobhistory
Hdfs-site.xml
<property>
<name>fs.alluxio.impl</name>
<value>alluxio.hadoop.FileSystem</value>
</property>
<property>
<name>fs.alluxio-ft.impl</name>
<value>alluxio.hadoop.FaultTolerantFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.alluxio.impl</name>
<value>alluxio.hadoop.AlluxioFileSystem</value>
</property>
How to let JobHistory use Alluxio
yarn-site.xml
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>alluxio://hostname:19998/tmp/app-logs</value>
</property>
JobHistory using Alluxio show
JobHistory using Alluxio show
Presto
Higher query throughput
Consistent low query latency
Eliminates network traffic
Presto + Alluxio = better together
• Alluxio led to 10x performance
improvement
• 100+ nodes
• More than 2.5 year.
•
When we use Alluxio for JDPresto, we make
some changes and bring some good features
• Pluggable
• Fault-tolerant
• Locality
Alluxio can be online or updated at any time
When Alluxio unable to access JDPresto
can access HDFS directly.
Reduce the remote read
Presto on Alluxio
load once
use every time
ō
AfterBefore
Presto on Alluxio
Presto HDFS
Alluxio
Presto on Alluxio
Presto on Alluxio
Presto on Alluxio
Speed Contrast
Presto on Alluxio
Review Alluxio Architecture
Watermark Evict Strategy
Start
apply for space
check space
load file from hdfs
release
space
space
enough
End
no space
• Sync Evit Strategy • Async Evit Strategy
Client
apply for space
High
watermark
load file from hdfs
Start
(async thread)
End
release space
N
Y
Alluxio Cache Consistency(1)
Alluxio Cache Consistency(2)
Start
is file
traverse the path
End
exist in UFS
file size
are same
modify time
are same
clean metadata
N
N
Y
Y
Y
Y
Keep Alluxio & HDFS Consistency
To ensure that dirty data is not read. There are three
ways to trigger file consistency check.
• RPC API
• RESTful API
• Alluxio Master startup
Client request metadata by getFileId, getFileInfo, listStatus, etc
Alluxio master will check file cache consistency
calling reloadMetaData to trigger Alluxio to
reload all metadata
check file cache consistency while master start up
Alluxio UI
JD for Alluxio
/ -
/ -
-
/ - -
- -
-
- - - -
- -
/ - /- -
-
- -
A
-
-
A
JD for Alluxio
PMC 1
Contributor 6
PR 50
Merged PR 47
Merged Commit 218
Additions/Deletions +4150/-2251
Alluxio in JD
- HA, stability, High Performance, Confidence
- Global Namespace
- Server-Side API Translation
- Monitorable & Measurable
- Cutability (fs metamountTabledistributed
cache)
Core expectations for Alluxio
Alluxio Exploration
• Exploring more application scenarios
• Porting HDFS Authentication to Alluxio
• HDFS RBF or Alluxio
Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and
speed up access to shuffle data
We are going to port custom permissions on existing HDFS to Alluxio.
We have tried to use HDFS router-based federation, but its performance
does not meet our online requirements. We find that Alluxio also has
forwarding capabilities and hopes that Alluxio will perform better.
3 1 1.1
. 1 1.1 1
https://alluxio-community.slack.com

Contenu connexe

Tendances

Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
Alluxio, Inc.
 

Tendances (20)

Powering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and PrestoPowering Interactive Analytics with Alluxio and Presto
Powering Interactive Analytics with Alluxio and Presto
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataproc
 
Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3Accelerating Hive with Alluxio on S3
Accelerating Hive with Alluxio on S3
 
Presto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On LabPresto on Alluxio Hands-On Lab
Presto on Alluxio Hands-On Lab
 
Best Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with SparkBest Practices for Using Alluxio with Spark
Best Practices for Using Alluxio with Spark
 
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the CloudAlluxio+Presto: An Architecture for Fast SQL in the Cloud
Alluxio+Presto: An Architecture for Fast SQL in the Cloud
 
Hybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxioHybrid collaborative tiered storage with alluxio
Hybrid collaborative tiered storage with alluxio
 
Alluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for DaskAlluxio-FUSE as a data access layer for Dask
Alluxio-FUSE as a data access layer for Dask
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Burst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copiesBurst Presto & Spark workloads to AWS EMR with no data copies
Burst Presto & Spark workloads to AWS EMR with no data copies
 
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
Cybersecurity and fraud detection at ING Bank using Presto & Alluxio on S3
 
Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18Alluxio: Unify Data at Memory Speed; 2016-11-18
Alluxio: Unify Data at Memory Speed; 2016-11-18
 
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
Optimizing Latency-sensitive queries for Presto at Facebook: A Collaboration ...
 
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
Rise of Intermediate APIs - Beam and Alluxio at Alluxio Meetup 2016
 
Best Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+AlluxioBest Practice in Accelerating Data Applications with Spark+Alluxio
Best Practice in Accelerating Data Applications with Spark+Alluxio
 
Achieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud WorldAchieving Separation of Compute and Storage in a Cloud World
Achieving Separation of Compute and Storage in a Cloud World
 
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
Ultra-fast SQL Analytics using PAS (Presto on Alluxio Stack)
 
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
Alluxio: Unify Data at Memory Speed at Strata and Hadoop World San Jose 2017
 
Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...Scalable and High available Distributed File System Metadata Service Using gR...
Scalable and High available Distributed File System Metadata Service Using gR...
 
Accelerate Cloud Training with Alluxio
Accelerate Cloud Training with AlluxioAccelerate Cloud Training with Alluxio
Accelerate Cloud Training with Alluxio
 

Similaire à The Practice of Alluxio in JD.com

Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Community
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
TRCK
 

Similaire à The Practice of Alluxio in JD.com (20)

Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...Speeding up I/O for Machine Learning  ft Apple Case Study using TensorFlow, N...
Speeding up I/O for Machine Learning ft Apple Case Study using TensorFlow, N...
 
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and PrestoStorageQuery: federated querying on object stores, powered by Alluxio and Presto
StorageQuery: federated querying on object stores, powered by Alluxio and Presto
 
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
Building a Cloud Native Stack with EMR Spark, Alluxio, and S3
 
Accelerating Spark with Kubernetes
Accelerating Spark with KubernetesAccelerating Spark with Kubernetes
Accelerating Spark with Kubernetes
 
Building Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, AlluxioBuilding Fast SQL Analytics on Anything with Presto, Alluxio
Building Fast SQL Analytics on Anything with Presto, Alluxio
 
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017 Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
Enable Fast Big Data Analytics on Ceph with Alluxio at Ceph Days 2017
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
Introduction to Alluxio 2.0 Preview | Simplifying data access for cloud workl...
 
oozieee.pdf
oozieee.pdfoozieee.pdf
oozieee.pdf
 
Running hadoop on ubuntu linux
Running hadoop on ubuntu linuxRunning hadoop on ubuntu linux
Running hadoop on ubuntu linux
 
What’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data managementWhat’s new in Alluxio 2: from seamless operations to structured data management
What’s new in Alluxio 2: from seamless operations to structured data management
 
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, LucidworksRunning Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
Running Solr at Memory Speed with Alluxio - Timothy Potter, Lucidworks
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
Deploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine LearningDeploying Alluxio in the Cloud for Machine Learning
Deploying Alluxio in the Cloud for Machine Learning
 
Open Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and CloudOpen Source Data Orchestration for AI, Big Data, and Cloud
Open Source Data Orchestration for AI, Big Data, and Cloud
 
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & MoreMeetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
Meetup at AI NextCon 2019: In-Stream data process, Data Orchestration & More
 
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
Running Machine Learning Workloads with Tensorflow, Alluxio and AWS S3
 
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the CloudInteractive Analytics with the Starburst Presto + Alluxio stack for the Cloud
Interactive Analytics with the Starburst Presto + Alluxio stack for the Cloud
 
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016Alluxio Use Cases at Strata+Hadoop World Beijing 2016
Alluxio Use Cases at Strata+Hadoop World Beijing 2016
 
Spark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri SimsaSpark Summit EU talk by Jiri Simsa
Spark Summit EU talk by Jiri Simsa
 

Plus de Alluxio, Inc.

Plus de Alluxio, Inc. (20)

Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with AlluxioOptimizing Data Access for Analytics And AI with Alluxio
Optimizing Data Access for Analytics And AI with Alluxio
 
Speed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio CachingSpeed Up Presto at Uber with Alluxio Caching
Speed Up Presto at Uber with Alluxio Caching
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/MLBig Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
Big Data Bellevue Meetup | Enhancing Python Data Loading in the Cloud for AI/ML
 
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
Alluxio Monthly Webinar | Why a Multi-Cloud Strategy Matters for Your AI Plat...
 
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...Alluxio Monthly Webinar | Five Disruptive Trends that Every  Data & AI Leader...
Alluxio Monthly Webinar | Five Disruptive Trends that Every Data & AI Leader...
 
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache EvictionData Infra Meetup | FIFO Queues are All You Need for Cache Eviction
Data Infra Meetup | FIFO Queues are All You Need for Cache Eviction
 
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio EdgeData Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
Data Infra Meetup | Accelerate Your Trino/Presto Queries - Gain the Alluxio Edge
 
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the CloudData Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
Data Infra Meetup | Accelerate Distributed PyTorch/Ray Workloads in the Cloud
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Data Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage EvolutionData Infra Meetup | Uber's Data Storage Evolution
Data Infra Meetup | Uber's Data Storage Evolution
 
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
Alluxio Monthly Webinar | Why NFS/NAS on Object Storage May Not Solve Your AI...
 
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
AI Infra Day | Accelerate Your Model Training and Serving with Distributed Ca...
 
AI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI EraAI Infra Day | The AI Infra in the Generative AI Era
AI Infra Day | The AI Infra in the Generative AI Era
 
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
AI Infra Day | Hands-on Lab: CV Model Training with PyTorch & Alluxio on Kube...
 
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...AI Infra Day | The Generative AI Market  And Intel AI Strategy and Product Up...
AI Infra Day | The Generative AI Market And Intel AI Strategy and Product Up...
 
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ MetaAI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
AI Infra Day | Composable PyTorch Distributed with PT2 @ Meta
 
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber ScaleAI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
AI Infra Day | Model Lifecycle Management Quality Assurance at Uber Scale
 
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWSAlluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
Alluxio Monthly Webinar | Efficient Data Loading for Model Training on AWS
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

The Practice of Alluxio in JD.com

  • 2. JDHDFS JDAlluxio JDCeph JDJDK JDKernel • Alluxio PMC • Hadoop contributor About me
  • 3. Contents A short Introduction Introduce how to build when you modify your Alluxio or hadoop Cache the job container log Using Alluxio accelerate JobHistory 10x performance improvement some of the features contributed by JD JD Contribution Expectation of Alluxio & Future plan Alluxio Future
  • 4.
  • 5. • It is the world’s first virtual distributed storage system. • Alluxio unifies data at memory-speed. • Virtual Data Lake What is Alluxio
  • 6. • Application interface • Apache Spark Presto Tensorflow • Apache Hbase • Apache Hive or Apache Flink • Storage interface • Amazon S3 Google Cloud Storage OpenStack Swift • GlusterFS HDFS(Various version) • IBM Cleversafe EMC ECS • Ceph NFS Alibaba OSS Alluxio is a bridge
  • 7. • Powered by alluxio https://www.alluxio.io/powered-by-alluxio/ Today, Alluxio is deployed in production by hundreds of organizations with the largest deployment exceeding 1,500 nodes.
  • 8. Alluxio is one of the fastest growing open source projects that has attracted more than 1000 contributors from over 300 institutions including Alibaba, Alluxio, Baidu, JD.COM,CMU, Google, IBM, Intel, N JU, Red Hat, Tencent, UC Berkeley, and Yahoo. • Active Open Source Comunity
  • 9.
  • 10. Why build? How to build? XXAlluxio or XXHadoop • mvn install -Pdist,native -DskipTests=true -Dmaven.javadoc.skip=true - Drequire.snappy -Dsnappy.prefix=/data0/snappy/ -Dcontainer- executor.conf.dir=/etc/yarn-executor/ -Dtar • mvn -T 4C clean install -Phadoop-2 -Dhadoop.version=2.7.1 -DskipTests - Dlicense.skip=true -Dfindbugs.skip -Dmaven.javadoc.skip -Dcheckstyle.skip ; dev/scripts/generate-tarballs -ufs-modules=all release
  • 11.
  • 12. • 22: 1 . : 9 91 22: 1 : -7. 0 . .1/:7 91 -7 -7 22: 1 0 . 2: 9-7$ 0 0 . 19- 2 91 22: 1 : -7. -7 1 9 2 / 0 . $ 9 2 / • Put alluxio client package into the jobhistory classpath. cp alluxio-core-client-hdfs-2.0.0-SNAPSHOT.jar hadoop-2.7.1/share/hadoop/hdfs/ How to let JobHistory use Alluxio
  • 16.
  • 18. Higher query throughput Consistent low query latency Eliminates network traffic Presto + Alluxio = better together
  • 19. • Alluxio led to 10x performance improvement • 100+ nodes • More than 2.5 year. • When we use Alluxio for JDPresto, we make some changes and bring some good features • Pluggable • Fault-tolerant • Locality Alluxio can be online or updated at any time When Alluxio unable to access JDPresto can access HDFS directly. Reduce the remote read Presto on Alluxio
  • 20. load once use every time ≈ç AfterBefore Presto on Alluxio
  • 25.
  • 27. Watermark Evict Strategy Start apply for space check space load file from hdfs release space space enough End no space • Sync Evit Strategy • Async Evit Strategy Client apply for space High watermark load file from hdfs Start (async thread) End release space N Y
  • 29. Alluxio Cache Consistency(2) Start is file traverse the path End exist in UFS file size are same modify time are same clean metadata N N Y Y Y Y Keep Alluxio & HDFS Consistency To ensure that dirty data is not read. There are three ways to trigger file consistency check. • RPC API • RESTful API • Alluxio Master startup Client request metadata by getFileId, getFileInfo, listStatus, etc Alluxio master will check file cache consistency calling reloadMetaData to trigger Alluxio to reload all metadata check file cache consistency while master start up
  • 31. JD for Alluxio / - / - - / - - - - - - - - - - - / - /- - - - - A - - A
  • 32. JD for Alluxio PMC 1 Contributor 6 PR 50 Merged PR 47 Merged Commit 218 Additions/Deletions +4150/-2251
  • 33.
  • 35. - HA, stability, High Performance, Confidence - Global Namespace - Server-Side API Translation - Monitorable & Measurable - Cutability (fs metamountTabledistributed cache) Core expectations for Alluxio
  • 36. Alluxio Exploration • Exploring more application scenarios • Porting HDFS Authentication to Alluxio • HDFS RBF or Alluxio Stores MapReduce/Spark shuffle data, to reduce disk storage pressure and speed up access to shuffle data We are going to port custom permissions on existing HDFS to Alluxio. We have tried to use HDFS router-based federation, but its performance does not meet our online requirements. We find that Alluxio also has forwarding capabilities and hopes that Alluxio will perform better.
  • 37. 3 1 1.1 . 1 1.1 1 https://alluxio-community.slack.com