SlideShare une entreprise Scribd logo
1  sur  48
HopsFS: 10X your HDFS with NDB
Jim Dowling
Associate Prof @ KTH
Senior Researcher @ SICS
CEO @ Logical Clocks AB
Oracle, Stockholm, 6th September 2016
www.hops.io
@hopshadoop
Hops Team
Active: Jim Dowling, Seif Haridi, Tor Björn Minde,
Gautier Berthou, Salman Niazi, Mahmoud Ismail,
Theofilos Kakantousis, Johan Svedlund Nordström,
Ermias Gebremeskel, Antonios Kouzoupis.
Alumni: Vasileios Giannokostas, Misganu Dessalegn,
Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca,
K “Sri” Srijeyanthan, Steffen Grohsschmiedt,
Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems,
Stig Viaene, Hooman Peiro, Evangelos Savvidis,
Jude D’Souza, Qi Qi, Gayana Chandrasekara,
Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos,
Peter Buechler, Pushparaj Motamari, Hamid Afzali,
Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
Marketing 101: Celebrity Endorsements
*Turing Award Winner 2014, Father of Distributed Systems
Hi!
I’m Leslie Lamport* and
even though you’re not
using Paxos, I approve
this product.
Bill Gates’ biggest product regret?*
Windows Future Storage (WinFS*)
*http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/
Hadoop in Context
6
Data Processing
Spark, MapReduce, Flink, Presto, Tensorflow
Storage
HDFS, MapR, S3, Collossus, WAS
Resource Management
YARN, Mesos, Borg
Metadata
Hive, Parquet, Authorization, Search
HDFS v2
7
DataNodes (up to ~5K)
HDFS Client
Journal Nodes Zookeeper
Active
NameNode
Standby
NameNode
Asynchronous Replication of EditLog
Agreement on the Active NameNode
Snapshots (fsimage) - cut the EditLog
(ls, rm, mv, cp,
stat, rm, chown,
copyFromLocal,
copyFromRemote,
chmod, etc)
The NameNode is the Bottleneck for Hadoop
8
Max Pause times for NameNode Heap Sizes*
9
Max Pause-Times
(ms)
100
1000
10000
10
JVM Heap Size (GB)
50 75 100 150
*OpenJDK or Oracle JVM
NameNode and Decreasing Memory Costs
10
Size (GB)
250
500
1000
Year
2016 2017 2018 2019 2020
0
750
Externalizing the NameNode State
•Problem:
NameNode not scaling up with lower RAM prices
•Solution:
Move the metadata off the JVM Heap
•Move it where?
An in-memory storage system that can be efficiently
queried and managed. Preferably Open-Source.
•MySQL Cluster (NDB)
11
HopsFS Architecture
12
NameNodes
NDB
Leader
HDFS Client
DataNodes
Pluggable DBs: Data Abstraction Layer (DAL)
13
NameNode
(Apache v2)
DAL API
(Apache v2)
NDB-DAL-Impl
(GPL v2)
Other DB
(Other License)
hops-2.5.0.jar dal-ndb-2.5.0-7.5.3.jar
The Global Lock in the NameNode
14
HDFS NameNode Internals
Client: mkdir, getblocklocations, createFile,…..
NameNode
Journal Nodes
Client
Reader1 ReaderN…
Handler1 HandlerM
ConnectionList
Call Queue
Namespace & In-Memory EditLogFSNameSystem Lock
EditLog Buffer
EditLog1 EditLog2 EditLog3
Listener
(Nio Thread)
Responder
(Nio Thread)
dfs.namenode.service.handlercount
(default 10)
ipc.server.read.threadpool.size
(default 1)
…
Handler1 HandlerM… Done RPCs
ackIdsflush
HopsFS NameNode Internals
Client: mkdir, getblocklocations, createFile,…..
NameNode
NDB
Client
Reader1 ReaderN…
Handler1 HandlerM
ConnectionList
Call Queue
inodes block_infos replicas
Listener
(Nio Thread)
Responder
(Nio Thread)
dfs.namenode.service.handlercount
(default 10)
ipc.server.read.threadpool.size
(default 1)
…
Handler1 HandlerM…
Done RPCs
ackIds
leases…
DAL-Impl
DALAPI
HARD PART
Concurrency Model: Implicit Locking
• Serializabile FS ops using implicit locking of subtrees.
17
[Hakimzadeh, Peiro, Dowling, ”Scaling HDFS with a Strongly Consistent Relational Model for Metadata”, DAIS 2014]
Preventing Deadlock and Starvation
•Acquire FS locks in agreed order using FS Hierarchy.
•Block-level operations follow the same agreed order.
•No cycles => Freedom from deadlock
•Pessimistic Concurrency Control ensures progress
18
/user/jim/myFilemv
read
block_report
DataNodeNameNodeClient
Per Transaction Cache
•Reusing the HDFS codebase resulted in too many
roundtrips to the database per transaction.
•We cache intermediate transaction results at
NameNodes (i.e., snapshot).
Sometimes, Transactions Just ain’t Enough
•Large Subtree Operations (delete, mv, set-quota)
can’t always be executed in a single Transaction.
•4-phase Protocol
• Isolation and Consistency
• Aggressive batching
• Transparent failure handling
• Failed ops retried on new NN.
• Lease timeout for failed clients.
20
Leader Election using NDB
•Leader to coordinate replication/lease management
•NDB as shared memory for Leader Election of NN.
21
[Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015]
Path Component Caching
•The most common operation in HDFS is resolving
pathnames to inodes
- 67% of operations in Spotify’s Hadoop workload
•We cache recently resolved inodes at NameNodes so
that we can resolve them using a single batch
primary key lookup.
- We validate cache entries as part of transactions
- The cache converts O(N) round trips to the database to
O(1) for a hit for all inodes in a path.
22
Path Component Caching
•Resolving a path of length N gives O(N) round-trips
•With our cache, O(1) round-trip for a cache hit
/user/jim/myFile
NDB
getInode
(0, “user”) getInode
(1, “jim”) getInode
(2, “myFile”)
NameNode
/user/jim/myFile
NDB
validateInodes
([(0, “user”),
(1,”jim”),
(2,”myFile”)])
NameNode
Cache
getInodes(“/user/jim/myFile”)
Hotspots
•Mikael saw 1-2 maxed out LDM threads
•Partitioning by parent inodeId meant
fantastic performance for ‘ls’
- Partition-pruned index scans
- At high load hotspots appeared at the
top of the directory hierarchy
•Current Solution:
- Cache the root inode at NameNodes
- Pseudo-random partition key for top-level
directories, but keep partition by parent
inodeId at lower levels
- At least 4x throughput increase!
24
/
/Users /Projects
/NSA /MyProj
/Dataset1 /Dataset2
Scalable Blocking Reporting
•On 100PB+ clusters, internal maintenance protocol
traffic makes up much of the network traffic
•Block Reporting
- Leader Load Balances
- Work-steal when exiting
safe-mode
SafeBlocks
DataNodes
NameNodes
NDB
Leader
Blocks
work steal
HopsFS Performance
26
HopsFS Metadata Scaleout
27Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
Spotify Workload
28
HopsFS Throughput (Spotify Workload - PM)
29Experiments performed on AWS EC2 with enhanced networking and C3.8xLarge instances
HopsFS Throughput (Spotify Workload - PM)
30Experiments performed on AWS EC2 with enhanced networking and C3.8xLarge instances
HopsFS Throughput (Spotify Workload - AM)
31
NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE.
NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
Per Operation HopsFS Throughput
32
NDB Performance Lessons
•NDB is quite stable!
•ClusterJ is (nearly) good enough
- sun.misc.Cleaner has trouble keeping up at high
throughput – OOM for ByteBuffers
- Transaction hint behavior not respected
- DTO creation time affected by Java Reflection
- Nice features would be:
• Projections
• Batched scan operations support
• Event API
•Event API and Asynchronous API needed for
performance in Hops-YARN
33
Heterogeneous Storage in HopsFS
34
•Storage Types in HopsFS: Default, EC-RAID5, SSD
- Default: 3X overhead - triple replication on spinning disks
- SSD: 3X overhead - triple replication on SSDs
- EC-RAID5: 1.4X overhead with low reconstruction overhead!
Erasure Coding
35
HDFS File (Sealed)
d0 d1 d2 d3 d4 d5 p0 p1 p1
overhead
(6+3)/6 = 1.5X
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 p0 p1 p2 p3(12+4)/16= 1.33X
RS(6,3)
RS(12,4)
host9
d0 d1 d2 d3 d4 p0
Global/Local Reconstruction with EC-RAID5
36
d0 d1 d2 d3 d4 p0Block0 Block9
Block10 Block11 Block12 Block13
host0
host10 host10 host10 host10
ZFS RAID-ZZFS RAID-Z
(10+2+2)/10 = 1.4X
(10+2+4)/10 = 1.6X
RS(10,2) LR(5,1).RS(10,4)LR(5,1).
ePipe: Indexing HopsFS’ Namespace
37
Free-Text
Search
NDBElasticSearch
Polyglot Persistence
The Distributed Database is the Single Source of Truth.
Foreign keys ensure the integrity of Extended Metadata.
MetaData
Designer
MetaData
Entry
NDB Event API
Hops-YARN
38
YARN Architecture
39
NodeManagers
YARN Client
Zookeeper Nodes
ResourceMgr Standby
ResourceMgr
1. Master-Slave Replication of RM State
2. Agreement on the Active ResourceMgr
NDB
ResourceManager– Monolithic but Modular
40
ApplicationMaster
Service
ResourceTracker
Service
Scheduler
Client
Service
YARN Client
Admin
Service
Security
Cluster State
HopsResourceTracker
Cluster State
HopsScheduler
NodeManagerNodeManagerYARN Client App MasterApp Master
ResourceManager
~2k ops/s ~10k ops/s
ClusterJ Event API
Hops-YARN Architecture
41
ResourceMgrs
NDB
Scheduler
YARN Client
NodeManagers
Resource Trackers
Leader Election for
Failed Scheduler
Hopsworks
42
Hopsworks – Project-Based Multi-Tenancy
•A project is a collection of
- Users with Roles
- HDFS DataSets
- Kafka Topics
- Notebooks, Jobs
•Per-Project quotas
- Storage in HDFS
- CPU in YARN
• Uber-style Pricing
•Sharing across Projects
- Datasets/Topics
43
project
dataset 1
dataset N
Topic 1
Topic N
Kafka
HDFS
Hopsworks – Dynamic Roles
44
Alice@gmail.com
NSA__Alice
Authenticate
Users__Alice
Glassfish
HopsFS
HopsYARN
Projects
Secure
Impersonation
Kafka
X.509
Certificates
SICS ICE - www.hops.site
45
A 2 MW datacenter research and test environment
Purpose: Increase knowledge, strengthen universities, companies and researchers
R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters
Karamel/Chef for Automated Installation
46
Google Compute Engine BareMetal
Summary
•HopsFS is the world’s fastest, most scalable HDFS
implementation
•Powered by NDB, the world’s fastest database 
•Thanks to Mikael, Craig, Frazer, Bernt and others
•Still room for improvement….
47
www.hops.io
Hops
[Hadoop For Humans]
Join us!
http://github.com/hopshadoop

Contenu connexe

Tendances

Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3nolmar01
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBaseVadim Tsesko
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis IntroductionAlex Su
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS Chris Harris
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi namboori
 
Intro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUGIntro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUGDavid Engfer
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learnedTit Petric
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataGruter
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB ProjectSonali Gupta
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "Kuniyasu Suzaki
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
A deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsA deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsCheng Min Chi
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Ontico
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory StorageDataWorks Summit
 

Tendances (20)

Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3Bio2 Rdf Presentation V3
Bio2 Rdf Presentation V3
 
Базы данных. HBase
Базы данных. HBaseБазы данных. HBase
Базы данных. HBase
 
Introduction to Mongodb
Introduction to MongodbIntroduction to Mongodb
Introduction to Mongodb
 
Redis Introduction
Redis IntroductionRedis Introduction
Redis Introduction
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Ravi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS ArchitectureRavi Namboori Hadoop & HDFS Architecture
Ravi Namboori Hadoop & HDFS Architecture
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Intro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUGIntro to the Hadoop Stack @ April 2011 JavaMUG
Intro to the Hadoop Stack @ April 2011 JavaMUG
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Redis/Lessons learned
Redis/Lessons learnedRedis/Lessons learned
Redis/Lessons learned
 
Introduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big DataIntroduction to Apache Tajo: Data Warehouse for Big Data
Introduction to Apache Tajo: Data Warehouse for Big Data
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
Tutorial 1
Tutorial 1Tutorial 1
Tutorial 1
 
HBase Mongo_DB Project
HBase Mongo_DB ProjectHBase Mongo_DB Project
HBase Mongo_DB Project
 
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
ASPLOS2011 workshop RESoLVE "Effect of Disk Prefetching of Guest OS "
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
A deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internalsA deeper-understanding-of-spark-internals
A deeper-understanding-of-spark-internals
 
Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)Шардинг в MongoDB, Henrik Ingo (MongoDB)
Шардинг в MongoDB, Henrik Ingo (MongoDB)
 
Democratizing Memory Storage
Democratizing Memory StorageDemocratizing Memory Storage
Democratizing Memory Storage
 

En vedette

Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullJim Dowling
 
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}Matheus Thomaz
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN Jim Dowling
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksJim Dowling
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of MetadataJim Dowling
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop HopsworksJim Dowling
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogMSAdvAnalytics
 
Shug meetup Hops Hadoop
Shug meetup Hops HadoopShug meetup Hops Hadoop
Shug meetup Hops HadoopJim Dowling
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
QA_QC FabricationBrochure (1)
QA_QC FabricationBrochure (1)QA_QC FabricationBrochure (1)
QA_QC FabricationBrochure (1)Dedy Suhardiman
 
Facility Programming_Sunda
Facility Programming_SundaFacility Programming_Sunda
Facility Programming_SundaVidel Oemry
 
MIILIV_M4C5 Appendice 2 parte 2
MIILIV_M4C5 Appendice 2   parte 2MIILIV_M4C5 Appendice 2   parte 2
MIILIV_M4C5 Appendice 2 parte 2raffaelebruno1
 

En vedette (20)

Spark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-fullSpark summit-east-dowling-feb2017-full
Spark summit-east-dowling-feb2017-full
 
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}
Minicurso Iniciando no Mundo Front-End - Dia 05 - SASPI {5}
 
On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN On-premise Spark as a Service with YARN
On-premise Spark as a Service with YARN
 
Multi-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on HopsworksMulti-tenant Flink as-a-service with Kafka on Hopsworks
Multi-tenant Flink as-a-service with Kafka on Hopsworks
 
Data Science with the Help of Metadata
Data Science with the Help of MetadataData Science with the Help of Metadata
Data Science with the Help of Metadata
 
Strata Hadoop Hopsworks
Strata Hadoop HopsworksStrata Hadoop Hopsworks
Strata Hadoop Hopsworks
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Cortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data CatalogCortana Analytics Workshop: Azure Data Catalog
Cortana Analytics Workshop: Azure Data Catalog
 
Rocking the World of Big Data at Centrica
Rocking the World of Big Data at CentricaRocking the World of Big Data at Centrica
Rocking the World of Big Data at Centrica
 
Shug meetup Hops Hadoop
Shug meetup Hops HadoopShug meetup Hops Hadoop
Shug meetup Hops Hadoop
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Uber
UberUber
Uber
 
QA_QC FabricationBrochure (1)
QA_QC FabricationBrochure (1)QA_QC FabricationBrochure (1)
QA_QC FabricationBrochure (1)
 
Laggan
LagganLaggan
Laggan
 
2009 pediatrics late results kasai
2009 pediatrics late results kasai2009 pediatrics late results kasai
2009 pediatrics late results kasai
 
Seminar
SeminarSeminar
Seminar
 
Facility Programming_Sunda
Facility Programming_SundaFacility Programming_Sunda
Facility Programming_Sunda
 
MIILIV_M4C5 Appendice 2 parte 2
MIILIV_M4C5 Appendice 2   parte 2MIILIV_M4C5 Appendice 2   parte 2
MIILIV_M4C5 Appendice 2 parte 2
 
Trabajo grupo psicología
Trabajo grupo psicologíaTrabajo grupo psicología
Trabajo grupo psicología
 

Similaire à 10X your HDFS with NDB

Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosHeiko Loewe
 
Introduction of Hadoop
Introduction of HadoopIntroduction of Hadoop
Introduction of HadoopShao-Yen Hung
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingZhe Zhang
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta PyData
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013Randall Hunt
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptxAyush .
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Simplilearn
 
Drupal MySQL Cluster
Drupal MySQL ClusterDrupal MySQL Cluster
Drupal MySQL ClusterKris Buytaert
 
Gregory engels nsd crash course - ilug10
Gregory engels   nsd crash course - ilug10Gregory engels   nsd crash course - ilug10
Gregory engels nsd crash course - ilug10Grégory Engels
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101MongoDB
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...RootedCON
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureThomas Uhl
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replicationMarc Schwering
 
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...Hooman Peiro Sajjad
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 

Similaire à 10X your HDFS with NDB (20)

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and MesosBig Data in Container; Hadoop Spark in Docker and Mesos
Big Data in Container; Hadoop Spark in Docker and Mesos
 
Introduction of Hadoop
Introduction of HadoopIntroduction of Hadoop
Introduction of Hadoop
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta Daniel Krasner - High Performance Text Processing with Rosetta
Daniel Krasner - High Performance Text Processing with Rosetta
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
 
HDFS+basics.pptx
HDFS+basics.pptxHDFS+basics.pptx
HDFS+basics.pptx
 
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
Hadoop Interview Questions And Answers Part-1 | Big Data Interview Questions ...
 
Drupal MySQL Cluster
Drupal MySQL ClusterDrupal MySQL Cluster
Drupal MySQL Cluster
 
Gregory engels nsd crash course - ilug10
Gregory engels   nsd crash course - ilug10Gregory engels   nsd crash course - ilug10
Gregory engels nsd crash course - ilug10
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Redis introduction
Redis introductionRedis introduction
Redis introduction
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
Sergi Álvarez + Roi Martín - radare2: From forensics to bindiffing [RootedCON...
 
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage ArhcitectureINFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
INFINISTORE(tm) - Scalable Open Source Storage Arhcitecture
 
2013 london advanced-replication
2013 london advanced-replication2013 london advanced-replication
2013 london advanced-replication
 
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highl...
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 

Plus de Jim Dowling

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfJim Dowling
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfJim Dowling
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfJim Dowling
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdfJim Dowling
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Jim Dowling
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022Jim Dowling
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupJim Dowling
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Jim Dowling
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Jim Dowling
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money LaunderingJim Dowling
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingJim Dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityJim Dowling
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020Jim Dowling
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines Jim Dowling
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyJim Dowling
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleJim Dowling
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Jim Dowling
 

Plus de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
Hopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, SunnyvaleHopsworks at Google AI Huddle, Sunnyvale
Hopsworks at Google AI Huddle, Sunnyvale
 
Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019 Hopsworks in the cloud Berlin Buzzwords 2019
Hopsworks in the cloud Berlin Buzzwords 2019
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

10X your HDFS with NDB

  • 1. HopsFS: 10X your HDFS with NDB Jim Dowling Associate Prof @ KTH Senior Researcher @ SICS CEO @ Logical Clocks AB Oracle, Stockholm, 6th September 2016 www.hops.io @hopshadoop
  • 2. Hops Team Active: Jim Dowling, Seif Haridi, Tor Björn Minde, Gautier Berthou, Salman Niazi, Mahmoud Ismail, Theofilos Kakantousis, Johan Svedlund Nordström, Ermias Gebremeskel, Antonios Kouzoupis. Alumni: Vasileios Giannokostas, Misganu Dessalegn, Rizvi Hasan, Paul Mälzer, Bram Leenders, Juan Roca, K “Sri” Srijeyanthan, Steffen Grohsschmiedt, Alberto Lorente, Andre Moré, Ali Gholami, Davis Jaunzems, Stig Viaene, Hooman Peiro, Evangelos Savvidis, Jude D’Souza, Qi Qi, Gayana Chandrasekara, Nikolaos Stanogias, Daniel Bali, Ioannis Kerkinos, Peter Buechler, Pushparaj Motamari, Hamid Afzali, Wasif Malik, Lalith Suresh, Mariano Valles, Ying Lieu.
  • 3. Marketing 101: Celebrity Endorsements *Turing Award Winner 2014, Father of Distributed Systems Hi! I’m Leslie Lamport* and even though you’re not using Paxos, I approve this product.
  • 4. Bill Gates’ biggest product regret?*
  • 5. Windows Future Storage (WinFS*) *http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/
  • 6. Hadoop in Context 6 Data Processing Spark, MapReduce, Flink, Presto, Tensorflow Storage HDFS, MapR, S3, Collossus, WAS Resource Management YARN, Mesos, Borg Metadata Hive, Parquet, Authorization, Search
  • 7. HDFS v2 7 DataNodes (up to ~5K) HDFS Client Journal Nodes Zookeeper Active NameNode Standby NameNode Asynchronous Replication of EditLog Agreement on the Active NameNode Snapshots (fsimage) - cut the EditLog (ls, rm, mv, cp, stat, rm, chown, copyFromLocal, copyFromRemote, chmod, etc)
  • 8. The NameNode is the Bottleneck for Hadoop 8
  • 9. Max Pause times for NameNode Heap Sizes* 9 Max Pause-Times (ms) 100 1000 10000 10 JVM Heap Size (GB) 50 75 100 150 *OpenJDK or Oracle JVM
  • 10. NameNode and Decreasing Memory Costs 10 Size (GB) 250 500 1000 Year 2016 2017 2018 2019 2020 0 750
  • 11. Externalizing the NameNode State •Problem: NameNode not scaling up with lower RAM prices •Solution: Move the metadata off the JVM Heap •Move it where? An in-memory storage system that can be efficiently queried and managed. Preferably Open-Source. •MySQL Cluster (NDB) 11
  • 13. Pluggable DBs: Data Abstraction Layer (DAL) 13 NameNode (Apache v2) DAL API (Apache v2) NDB-DAL-Impl (GPL v2) Other DB (Other License) hops-2.5.0.jar dal-ndb-2.5.0-7.5.3.jar
  • 14. The Global Lock in the NameNode 14
  • 15. HDFS NameNode Internals Client: mkdir, getblocklocations, createFile,….. NameNode Journal Nodes Client Reader1 ReaderN… Handler1 HandlerM ConnectionList Call Queue Namespace & In-Memory EditLogFSNameSystem Lock EditLog Buffer EditLog1 EditLog2 EditLog3 Listener (Nio Thread) Responder (Nio Thread) dfs.namenode.service.handlercount (default 10) ipc.server.read.threadpool.size (default 1) … Handler1 HandlerM… Done RPCs ackIdsflush
  • 16. HopsFS NameNode Internals Client: mkdir, getblocklocations, createFile,….. NameNode NDB Client Reader1 ReaderN… Handler1 HandlerM ConnectionList Call Queue inodes block_infos replicas Listener (Nio Thread) Responder (Nio Thread) dfs.namenode.service.handlercount (default 10) ipc.server.read.threadpool.size (default 1) … Handler1 HandlerM… Done RPCs ackIds leases… DAL-Impl DALAPI HARD PART
  • 17. Concurrency Model: Implicit Locking • Serializabile FS ops using implicit locking of subtrees. 17 [Hakimzadeh, Peiro, Dowling, ”Scaling HDFS with a Strongly Consistent Relational Model for Metadata”, DAIS 2014]
  • 18. Preventing Deadlock and Starvation •Acquire FS locks in agreed order using FS Hierarchy. •Block-level operations follow the same agreed order. •No cycles => Freedom from deadlock •Pessimistic Concurrency Control ensures progress 18 /user/jim/myFilemv read block_report DataNodeNameNodeClient
  • 19. Per Transaction Cache •Reusing the HDFS codebase resulted in too many roundtrips to the database per transaction. •We cache intermediate transaction results at NameNodes (i.e., snapshot).
  • 20. Sometimes, Transactions Just ain’t Enough •Large Subtree Operations (delete, mv, set-quota) can’t always be executed in a single Transaction. •4-phase Protocol • Isolation and Consistency • Aggressive batching • Transparent failure handling • Failed ops retried on new NN. • Lease timeout for failed clients. 20
  • 21. Leader Election using NDB •Leader to coordinate replication/lease management •NDB as shared memory for Leader Election of NN. 21 [Niazi, Berthou, Ismail, Dowling, ”Leader Election in a NewSQL Database”, DAIS 2015]
  • 22. Path Component Caching •The most common operation in HDFS is resolving pathnames to inodes - 67% of operations in Spotify’s Hadoop workload •We cache recently resolved inodes at NameNodes so that we can resolve them using a single batch primary key lookup. - We validate cache entries as part of transactions - The cache converts O(N) round trips to the database to O(1) for a hit for all inodes in a path. 22
  • 23. Path Component Caching •Resolving a path of length N gives O(N) round-trips •With our cache, O(1) round-trip for a cache hit /user/jim/myFile NDB getInode (0, “user”) getInode (1, “jim”) getInode (2, “myFile”) NameNode /user/jim/myFile NDB validateInodes ([(0, “user”), (1,”jim”), (2,”myFile”)]) NameNode Cache getInodes(“/user/jim/myFile”)
  • 24. Hotspots •Mikael saw 1-2 maxed out LDM threads •Partitioning by parent inodeId meant fantastic performance for ‘ls’ - Partition-pruned index scans - At high load hotspots appeared at the top of the directory hierarchy •Current Solution: - Cache the root inode at NameNodes - Pseudo-random partition key for top-level directories, but keep partition by parent inodeId at lower levels - At least 4x throughput increase! 24 / /Users /Projects /NSA /MyProj /Dataset1 /Dataset2
  • 25. Scalable Blocking Reporting •On 100PB+ clusters, internal maintenance protocol traffic makes up much of the network traffic •Block Reporting - Leader Load Balances - Work-steal when exiting safe-mode SafeBlocks DataNodes NameNodes NDB Leader Blocks work steal
  • 27. HopsFS Metadata Scaleout 27Assuming 256MB Block Size, 100 GB JVM Heap for Apache Hadoop
  • 29. HopsFS Throughput (Spotify Workload - PM) 29Experiments performed on AWS EC2 with enhanced networking and C3.8xLarge instances
  • 30. HopsFS Throughput (Spotify Workload - PM) 30Experiments performed on AWS EC2 with enhanced networking and C3.8xLarge instances
  • 31. HopsFS Throughput (Spotify Workload - AM) 31 NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.
  • 32. Per Operation HopsFS Throughput 32
  • 33. NDB Performance Lessons •NDB is quite stable! •ClusterJ is (nearly) good enough - sun.misc.Cleaner has trouble keeping up at high throughput – OOM for ByteBuffers - Transaction hint behavior not respected - DTO creation time affected by Java Reflection - Nice features would be: • Projections • Batched scan operations support • Event API •Event API and Asynchronous API needed for performance in Hops-YARN 33
  • 34. Heterogeneous Storage in HopsFS 34 •Storage Types in HopsFS: Default, EC-RAID5, SSD - Default: 3X overhead - triple replication on spinning disks - SSD: 3X overhead - triple replication on SSDs - EC-RAID5: 1.4X overhead with low reconstruction overhead!
  • 35. Erasure Coding 35 HDFS File (Sealed) d0 d1 d2 d3 d4 d5 p0 p1 p1 overhead (6+3)/6 = 1.5X d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 d10 d11 p0 p1 p2 p3(12+4)/16= 1.33X RS(6,3) RS(12,4)
  • 36. host9 d0 d1 d2 d3 d4 p0 Global/Local Reconstruction with EC-RAID5 36 d0 d1 d2 d3 d4 p0Block0 Block9 Block10 Block11 Block12 Block13 host0 host10 host10 host10 host10 ZFS RAID-ZZFS RAID-Z (10+2+2)/10 = 1.4X (10+2+4)/10 = 1.6X RS(10,2) LR(5,1).RS(10,4)LR(5,1).
  • 37. ePipe: Indexing HopsFS’ Namespace 37 Free-Text Search NDBElasticSearch Polyglot Persistence The Distributed Database is the Single Source of Truth. Foreign keys ensure the integrity of Extended Metadata. MetaData Designer MetaData Entry NDB Event API
  • 39. YARN Architecture 39 NodeManagers YARN Client Zookeeper Nodes ResourceMgr Standby ResourceMgr 1. Master-Slave Replication of RM State 2. Agreement on the Active ResourceMgr
  • 40. NDB ResourceManager– Monolithic but Modular 40 ApplicationMaster Service ResourceTracker Service Scheduler Client Service YARN Client Admin Service Security Cluster State HopsResourceTracker Cluster State HopsScheduler NodeManagerNodeManagerYARN Client App MasterApp Master ResourceManager ~2k ops/s ~10k ops/s ClusterJ Event API
  • 43. Hopsworks – Project-Based Multi-Tenancy •A project is a collection of - Users with Roles - HDFS DataSets - Kafka Topics - Notebooks, Jobs •Per-Project quotas - Storage in HDFS - CPU in YARN • Uber-style Pricing •Sharing across Projects - Datasets/Topics 43 project dataset 1 dataset N Topic 1 Topic N Kafka HDFS
  • 44. Hopsworks – Dynamic Roles 44 Alice@gmail.com NSA__Alice Authenticate Users__Alice Glassfish HopsFS HopsYARN Projects Secure Impersonation Kafka X.509 Certificates
  • 45. SICS ICE - www.hops.site 45 A 2 MW datacenter research and test environment Purpose: Increase knowledge, strengthen universities, companies and researchers R&D institute, 5 lab modules, 3-4000 servers, 2-3000 square meters
  • 46. Karamel/Chef for Automated Installation 46 Google Compute Engine BareMetal
  • 47. Summary •HopsFS is the world’s fastest, most scalable HDFS implementation •Powered by NDB, the world’s fastest database  •Thanks to Mikael, Craig, Frazer, Bernt and others •Still room for improvement…. 47 www.hops.io
  • 48. Hops [Hadoop For Humans] Join us! http://github.com/hopshadoop

Notes de l'éditeur

  1. I am going to talk about realizing Bill Gate’s vision for a filesystem in the Hadoop Ecosystem. “WinFS was an attempt to bring the benefits of schema and relational databases to the Windows file system. …The WinFS effort was started around 1999 as the successor to the planned storage layer of Cairo and died in 2006 after consuming many thousands of hours of efforts from really smart engineers.” [Brian Welcker]** **http://blogs.msdn.com/b/bwelcker/archive/2013/02/11/the-vision-thing.aspx
  2. The kind of challenges you have with the NN are managing large clusters and configuring the NN.
  3. Slope of the Bottom Line is based on improvements in garbage collection technology – Azul JVM, Shenndowagh, etc Slope of the top line is based on Moore’s Law.
  4. Apache Spark already moving in this direction – Tachyon
  5. The NameNode has multi-reader, single writer concurrency semantics. Operations that would hold the write lock for too long, starving clients, are not executed atomically. For example, deleting a directory subtree with millions of files, involves deleting batches of files, yielding the global lock for a period, then re-acquiring it, to continue the operation.
  6. With global lock, it’s easy.
  7. If something is not atomic, you have to handl all possible failures
  8. Only new Protocol Buffer Message we added to DNs
  9. reconstruction read is expensive
  10. The Resource Manager (RM) is a bottleneck. Zookeeper throughput not high enough to persist all RM state Standby resource manager can only recover partial state All running jobs must be restarted. RM state not queryable. The RM is a State-Machine. Almost no session state to manage.
  11. Privileges – upload/download data, run analysis jobs Like RBAC solution. All access via HopsWorks.
  12. 44
  13. I need some sound-effects to go with that.