PhegData X - High Performance EBS

PHEGDATA X
High Performance EBS

• The most successful database machine vendor in China
• Market share ~20%, lower than Oracle Exadata, higher than Huawei FusionCube
• Focusing on performance optimization for real applications
• PhDX (PhegData X) inherit the core of database machine
• Resources pooling, strong consistency, low latency I/O, etc.
• High efficient cache engine for mix media environment
• Adding more for virtualization and container systems
• RESTful API, support OpenStack Cinder
• VMware VAAI/vVol, Docker graph driver reday
History & Backgound

• Replacing high-end disk array to support mission critical applications
• Scale out architecture
• Proven data center level reliability, serviceability and performance
• Traditional as well as new applications
• Oracle RAC, DB2 PureScale/DPF, Sybase, MySQL, PostgreSQL …
• Hadoop, Spark, Storm, Kafka, Druid …
• VMware, KVM, XEN, Docker, rkt ...
Targeting on …

• PhDX = Generic x86 hardware + S2EBS (SmartScaleEBS) software
• Hardware, nothing special, just commodity metal box
• CPU: Intel Xeon E5/E7 series, v2/v3/v4
• Flash: SATA/NVMe/PCIe SSD, NVDIMM releasing soon
• Network: GbE/10GbE/InfiniBand, Intel Omni-Path ready
• S2EBS (SmartScaleEBS)Software
• DHT based distributed system, no centric meta data node
• Block level interfaces, iSCSI, SRP, iSER and S2EBS native protocol
• RESTful API to support objective interfaces (Cinder, S3 compatible, etc.)
Inside PhDX (PhegData X)

System layout
Management StreamData Stream
App Host
BAC
App Host
BAC MDS… MDS…
IOS Node
iosdRouter
SmartCache
OSD OSD OSD
IOS Node
iosdRouter
SmartCache
OSD OSD OSD
IOS Node
iosdRouter
SmartCache
OSD OSD OSD

BAC maintain logical volumes
Chunk
Chunk
Chunk
OSD
Chunk
Chunk
Chunk
OSD
Chunk
Chunk
Chunk
OSD
Disk PoolLogical
Volume
A logical volume is a set of chunks. The mappings are maintained by BAC module.

Volume type
Dynamic
Volume
Static
Volume
Disk Pool
No real resource assigned
until data coming in.
All chunks are allocated
while volume initialized.

• Metadata Area
• Super Block——64KB
• Space Bitmap——2MB
• Key Space（Mapping B+ Tree）——512MB
• Data Area
OSD maintain physical disks
Key Space Data Area
Space
Bitmap
Super
Block

Keep different disks same usage ratio
Pool
All vOSDs
are equally
used.
Cut physical disk into vOSD (4GB by default) and it’s the actual unit of DHT ring.
3TB
OSD
8TB
OSD6TB
OSD
vOSD
vOSD
vOSD
vOSD
vOSD
vOSD
vOSD

Router in the middle of I/O process
APP Router
Router
Router
Router
vOSD
vOSD
vOSD
vOSD
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
DHT
/dev/sd*
BAC
Driver
10GbE/IB
S2EBS Native
Protocol

Why native protocol?
Initiator
Target
Host
BAC
Router
Host
Router
Router
Router
…
CTL
SCSI-3 over FC/iSCSI S2EBS native protocol
SAN protocol is more for
traditional scale up architecture
Scale out oriented, up to 128
routers serving one BAC

Compatible with exiting SAN
APP Router
Router
Router
Router
vOSD
vOSD
vOSD
vOSD
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
Chunk
DHT
/dev/dm*
MPIO
Driver
10GbE/FC/IB
BAC
BAC
BAC
BAC
iSCSI/FCP/SRP/iSER

Agile redundancy control
Pool-a: 2-rep 2-rep
Pool-b: 3-rep
3-rep
2-rep 3-rep 4-rep 5-rep
Pool
Common ServerSAN control
redundancy per pool
S2EBS control
redundancy per volume

Benefit of volume redundancy control
Get me just 500GB
with 3-rep protected,
the rest will be good
with 2-rep protected
Get all 2-rep
protected data
up to 3-rep
Ctl per pool Ctl per volume
Preserve capacity for
each protecting level

Concept of safe boundary
…
Multiple disks
failure could
cause data lost
The more disks
there the more
multiple disks
failure happens
Fact A:
Fact B:
Replicas spread on
too many disks will
impact reliability
Data center requires
99.999% availability
By calculation:
2-rep protection should spread
replicas on less than 100 disks;
3-rep protection should spread
replicas on less than 500 disks

Safe boundary related with performance
…
Vol BVol A Vol C
…
Vol BVol A Vol C
Pool safe
boundary
Vol A safe
boundary
Vol B safe
boundary
Vol C safe
boundary
With pool redundancy ctl, safe
boundary limit simultaneous process
With volume redundancy ctl, simultaneous
process range is bigger than safe boundary
Simultaneous range
of all volumes
Simultaneous range
of all volumes

• EMC ScaleIO
• Still need centric meta data server, scalability is questionable.
• Ceph
• Poor performance and poor stability.
• VMware vSAN
• Extremely poor performance
• Only work with VMware vSphere
• Nutanix NDFS
• Poor performance, especially high latency
• Not block level storage
Comparison with Equivalents

• Performance! Performance! Performance!
• Low latency - 2ms via 10GbE or 0.2ms via InfiniBand
• Parallel processing - Up to 128 nodes serving one volume, IOps & MBps easily heat
physical limitation on host side
• Tiny overhead - 24 bits per I/O, over 99.4% physical bandwidth capable for real data
• Small footprint on host side - 8MB would be enough in most cases
• Little CPU consumption – one core can stably provide 4k~5k IOps
• Agile redundancy control per volume
• Volumes request different redundant level could be created from same pool
• No data migration nor down time, when changing protection level
• Erasing Code being support the same way in next release
PhDX key features and differences

THANK YOU!
PHEGDA TECHNOLOGY LTD.
weiyuan_dong@phegda.com

PhegData X - High Performance EBS

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à PhegData X - High Performance EBS

Similaire à PhegData X - High Performance EBS (20)

Dernier

Dernier (20)

PhegData X - High Performance EBS