SlideShare une entreprise Scribd logo
1  sur  17
Ceph Deployment at Target:
Best Practices and Lessons Learned
Agenda
2
Introduction
Will Boege
Sr. Technical Architect
RAPID Team (Private Cloud and Automation)
Agenda
3
@
First Ceph Environment at Target went live in October of 2014
• “Firefly” Release
Ceph was backing Target’s first ‘official’ Openstack release
• Icehouse Based
• Ceph is used for:
• RBD for Openstack Instances and Volumes
• RADOSGW for Object (instead of Swift)
• RBD backing Celiometer MongoDB volumes
• Currently DEV is largest environment with ~1700 instances
Replaced traditional array-based approach that was implemented in our
prototype Havana environment.
• Traditional storage model was problematic to integrate
• Maintenance/purchase costs from array vendors can get prohibitive
• Traditional SAN just doesn’t ‘feel’ right in this space.
• Ceph’s tight integration with Openstack
Agenda
4
@
Initial Ceph Deployment:
• 3 x Monitor Nodes – Cisco B200
• 12 x OSD Nodes – Cisco C240 LFF
• 12 4TB SATA Disks
• 10 OSD per server
• Journal partition co-located on each OSD disk
• 120 OSD Total = ~ 400 TB
• 2x 10GBE per host
• 1 public_network
• 1 cluster_network
• Basic LSI ‘MegaRaid’ controller – SAS 2008M-8i
• No supercap or cache capability onboard
• 10xRAID0
Post rollout it became evident that there were performance issues within
the environment.
• KitchenCI users would complain of slow Chef converge times
• Yum transactions / app deployments would take abnormal amounts of time to
complete.
• Instance boot times, especially for images using cloud-init would take excessively
long time to boot
• General user griping about ‘slowness’
Lesson #1 -
Instrument Your Deployment!
Track statistics/ metrics that have real impact to
your users
Unacceptable levels of latency even while cluster was relatively unworked
High levels of CPU IOWait% on the OSD servers & IDLE Openstack Instances
Poor IOPS / Latency - FIO benchmarks running INSIDE Openstack Instances
$ fio --rw=write --ioengine=libaio --runtime=100 --direct=1 --bs=4k --size=10G --iodepth=32 --name=/tmp/testfile.bin
test: (groupid=0, jobs=1): err= 0: pid=1914
read : io=1542.5MB, bw=452383 B/s, iops=110 , runt=3575104msec
write: io=527036KB, bw=150956 B/s, iops=36 , runt=3575104msec
Having more effective instrumentation from the outset would have revealed obvious
problems with our architecture
Compounding the performance issues we began to see mysterious
reliability issues.
• OSDs would randomly fall offline
• Cluster would enter a HEALTH_ERR state about once a week with ‘unfound objects’
and/or Inconsistent page groups that required manual intervention to fix.
• These problems were usually coupled with a large drop in our already suspect
performance levels
Lesson #2 –
Do your research on hardware
your server vendor provides!
Don’t just blindly accept whatever they had laying around, be proactive!
• Root cause of HEALTH_ERRs was “unnamed vendor’s” SATA drives in our
solution ‘soft-failing’ – slowly gaining media errors without reporting themselves
as failed. Don’t rely on SMART. Interrogate your disks with a array-level tool, like
MegaRAID to identify drives for proactive replacement.
$ opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep Media
• In installations with co-located journal partitions, a RAID solution with
cache+BBU for writeback operation would have been a huge performance gain.
Paying more attention to the suitability of hardware our vendor of choice provided
would have saved a lot of headaches
Which leads us to –
Lesson #3 –
Ceph is not magic. It does the best
with the hardware you give it!
Much ill-advised advice floating around that if you throw enough crappy disks at Ceph you
will achieve enterprise grade performance. Garbage in – Garbage out. Don’t be greedy and
build for capacity, if your objective is to create more a performant block storage solution.
Agenda
1
0
New Ceph OSD Deployment:
• 5 x OSD Nodes – Cisco C240M3 SFF
• 18 10k Seagate SAS 1.1TB
• 6 480g Intel S3500 SSD
• I like Intel SSDs for use with Ceph. Huge disparity between SSD vendors
performance.
• 18 OSD per server
• Journal partition on SSD with 4/5:1 OSD/Journal ratio
• 90 OSD Total = ~ 100 TB
• Improved LSI ‘MegaRaid’ controller – SAS-9271-8i
• Supercap
• Writeback capability
• 18xRAID0
• Writethru on journals, writeback on spinning OSDs .
• Still experimenting with this! – Writeback seems to help on systems
without JBOD mode adapters!
• UCS M4 Gen finally has cards that support LSI’s ‘ITMODE’ or JBOD!
• Based on “Hammer” Ceph Release
Lessons learned – we set out to rebuild
• Obtaining metrics from our design change was nearly immediate due to having
effective monitors in place
– Latency improvements have been extreme
– IOWait% within Openstack instances have been greatly reduced
– Raw IOPS throughput has sykrocketed
• Testing Celiometer backended by MongoDB on kRBD I’ve seen this 5 node / 90 OSD cluster
spike to ~25k IOPS
– Throughput testing with RADOS bench and FIO shows aprox. 10 fold increase
– User feedback has been extremely positive, general Openstack experience at Target is
much improved.
– Performance within Openstack instances has increase about 10x
Results
test: (groupid=0, jobs=1): err= 0: pid=1914
read : io=1542.5MB, bw=452383 B/s,
iops=110 , runt=3575104msec
write: io=527036KB, bw=150956 B/s,
iops=36 , runt=3575104msec
test: (groupid=0, jobs=1): err= 0: pid=2131
read : io=2046.6MB, bw=11649KB/s,
iops=2912 , runt=179853msec
write: io=2049.1MB, bw=11671KB/s,
iops=2917 , runt=179853msec
• Before embarking on creating a Ceph environment, have a good idea of what
your objectives are for the environment.
– Capacity?
– Performance?
• If you make wrong decisions it can lead to a negative user perception of Ceph,
and technologies that depend on it, like Openstack
• Once you understand your objective, understand that your hardware selection is
crucial to your success
• Unless you are architecting for raw capacity, use SSDs for your journal volumes
without exception
– If you must co-locate journals, use a RAID adapter with BBU+Writeback cache
• A hybrid approach may be feasible with SATA ‘capacity’ disks with SSD
journals. I’ve yet to try this, I’d be interested in seeing some benchmark data on
a setup like this
• Research, experiment, consult with Red Hat / Inktank
• Monitor, monitor, monitor and provide a very short feedback loop for your users
to engage you with their concerns
Conclusion
• Looking to test all SSD pool performance
– All SSD in Ceph has been maturing rapidly
– We have needs for a ‘ultra’ Cinder tier for workloads that require high IOPS / low
latency for use cases such as Kafka, Cassandra
– Also considering Solidfire for this use case
– If anyone has experience with this – I’d love to hear about it!
• Repurposing legacy SATA hardware into a dedicated object pool
– High capacity, low performance drives should work well in an object use case – more
research is needed into end-user requirements
• Automate deployment with Chef to bring parity with our Openstack automation
• Broadening Ceph beyond cloud niche use case. Especially with improved object
offering.
• Repurpose ‘capacity’ frames
– Video archiving for security camera footage
– Register / POS log archiving
Next Steps
• Plan time into your deployment schedule to iron out dependancy hell, especially
if you are moving from Inktank packages to Red Hat packages
• In Hammer, you no longer have to use Apache and the FastCGI shim for
RADOSGW object service. Enable civitweb with the following entry in the
[client.radosgw.gateway] section of ceph.conf and make sure you shut off
Apache!
– rgw_frontends = "civetweb port=80”
• Use new and improved CRUSH algorithm. This WILL trigger a lot of rebalancing
activity!
– $ ceph osd crush tunables optimal
• In the [osd] section of ceph.conf set the following directive. This prevents new
OSDs from triggering rebalancing. Nope, setting NOIN won’t do the trick!
– osd_crush_update_on_start = false
• Ceph’s default recovery settings are far too aggressive. Tone it down with the
following in the [osd] section or it will impact client IO
osd_max_backfills = 1
osd_recovery_priority = 1
osd_client_op_priority = 63
osd_recovery_max_active = 1
osd_recovery_max_single_start = 1
General Tips on Migrations/Upgrade to Hammer
• Best method to ‘drain’ hosts is by adjusting the CRUSH weight of the OSDs on
those hosts, NOT the OSD weight.
– CRUSH weight dictates cluster-wide data distribution. OSD weights only impact the
host the OSD is on and can cause unpredictability.
• Don’t work serially host by host – Drop the CRUSH weight of all the OSDs you
are removing across the cluster simultaneously. I used a ‘reduce by 50% and
allow recovery’ scheme. Your mileage may vary.
$ for i in {0..119}; do ceph osd crush reweight osd.$i 3.0; done
$ for i in {0..119}; do ceph osd crush reweight osd.$i 1.5; done
$ for i in {0..119}; do ceph osd crush reweight osd.$i .75; done
• Consider numad to auto-magically set numa affinities.
– Still experimenting with the impact of this on cluster performance.
• Last but not least – VERY Important. You WILL run out of threads and OSDs
WILL crash if you don’t tune the kernel.pid_max value – especially in servers
with > 12 OSDs
$ echo "kernel.pid_max = 4194303" >> /etc/sysctl.conf
Thanks For Your Time!
Questions?
&

Contenu connexe

Tendances

Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Community
 

Tendances (20)

Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
inwinSTACK - ceph integrate with kubernetes
inwinSTACK - ceph integrate with kubernetesinwinSTACK - ceph integrate with kubernetes
inwinSTACK - ceph integrate with kubernetes
 
Stabilizing Ceph
Stabilizing CephStabilizing Ceph
Stabilizing Ceph
 
Ceph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der SterCeph for Big Science - Dan van der Ster
Ceph for Big Science - Dan van der Ster
 
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
Ceph Day Melbourne - Scale and performance: Servicing the Fabric and the Work...
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2
 
MySQL Head-to-Head
MySQL Head-to-HeadMySQL Head-to-Head
MySQL Head-to-Head
 
Ceph and Mirantis OpenStack
Ceph and Mirantis OpenStackCeph and Mirantis OpenStack
Ceph and Mirantis OpenStack
 
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin ZhangLinux Block Cache Practice on Ceph BlueStore - Junxin Zhang
Linux Block Cache Practice on Ceph BlueStore - Junxin Zhang
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
Which Hypervisor is Best?
Which Hypervisor is Best?Which Hypervisor is Best?
Which Hypervisor is Best?
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Designing for High Performance Ceph at Scale
Designing for High Performance Ceph at ScaleDesigning for High Performance Ceph at Scale
Designing for High Performance Ceph at Scale
 
Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
QCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference ArchitectureQCT Ceph Solution - Design Consideration and Reference Architecture
QCT Ceph Solution - Design Consideration and Reference Architecture
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Ceph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to EnterpriseCeph Day Taipei - Bring Ceph to Enterprise
Ceph Day Taipei - Bring Ceph to Enterprise
 

En vedette

IBM System Storage® : la famiglia si allarga…ultimi annunci
IBM System Storage® : la famiglia si allarga…ultimi annunciIBM System Storage® : la famiglia si allarga…ultimi annunci
IBM System Storage® : la famiglia si allarga…ultimi annunci
S.info Srl
 
Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovance
eNovance
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Community
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Community
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Community
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Community
 

En vedette (20)

Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
Ceph: Open Source Storage Software Optimizations on Intel® Architecture for C...
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
IBM System Storage® : la famiglia si allarga…ultimi annunci
IBM System Storage® : la famiglia si allarga…ultimi annunciIBM System Storage® : la famiglia si allarga…ultimi annunci
IBM System Storage® : la famiglia si allarga…ultimi annunci
 
Openstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovanceOpenstack Summit HK - Ceph defacto - eNovance
Openstack Summit HK - Ceph defacto - eNovance
 
Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update Ceph Day Taipei - Community Update
Ceph Day Taipei - Community Update
 
Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph Ceph Day Shanghai - On the Productization Practice of Ceph
Ceph Day Shanghai - On the Productization Practice of Ceph
 
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom LabsCeph Day Shanghai - Ceph in Chinau Unicom Labs
Ceph Day Shanghai - Ceph in Chinau Unicom Labs
 
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise Ceph Day Chicago - Brining Ceph Storage to the Enterprise
Ceph Day Chicago - Brining Ceph Storage to the Enterprise
 
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
Ceph Day Shanghai - Hyper Converged PLCloud with Ceph
 
Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update Ceph Day Shanghai - Community Update
Ceph Day Shanghai - Community Update
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
Ceph Day Chicago - Deploying flash storage for Ceph without compromising perf...
 
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by WorkloadCeph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
Ceph Day Chicago - Supermicro Ceph - Open SolutionsDefined by Workload
 
Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg Ceph Day Chicago - Ceph at work at Bloomberg
Ceph Day Chicago - Ceph at work at Bloomberg
 
Ceph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking ToolCeph Tech Talk -- Ceph Benchmarking Tool
Ceph Tech Talk -- Ceph Benchmarking Tool
 
软件定义存储
软件定义存储软件定义存储
软件定义存储
 
Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage Ceph Day Taipei - Ceph on All-Flash Storage
Ceph Day Taipei - Ceph on All-Flash Storage
 

Similaire à Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned

Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Community
 

Similaire à Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned (20)

TUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data CenterTUT18972: Unleash the power of Ceph across the Data Center
TUT18972: Unleash the power of Ceph across the Data Center
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
ceph-barcelona-v-1.2
ceph-barcelona-v-1.2ceph-barcelona-v-1.2
ceph-barcelona-v-1.2
 
Ceph barcelona-v-1.2
Ceph barcelona-v-1.2Ceph barcelona-v-1.2
Ceph barcelona-v-1.2
 
Performance analysis with_ceph
Performance analysis with_cephPerformance analysis with_ceph
Performance analysis with_ceph
 
SUSE - performance analysis-with_ceph
SUSE - performance analysis-with_cephSUSE - performance analysis-with_ceph
SUSE - performance analysis-with_ceph
 
Ceph
CephCeph
Ceph
 
Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017Sanger OpenStack presentation March 2017
Sanger OpenStack presentation March 2017
 
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
Ceph Day Beijing: Experience Sharing and OpenStack and Ceph Integration
 
Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt Scaling Ceph at CERN - Ceph Day Frankfurt
Scaling Ceph at CERN - Ceph Day Frankfurt
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt Ceph Performance and Optimization - Ceph Day Frankfurt
Ceph Performance and Optimization - Ceph Day Frankfurt
 
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix BarbeiraBackup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
Backup management with Ceph Storage - Camilo Echevarne, Félix Barbeira
 
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan XuCeph Goes on Online at Qihoo 360 - Xuehan Xu
Ceph Goes on Online at Qihoo 360 - Xuehan Xu
 
Your 1st Ceph cluster
Your 1st Ceph clusterYour 1st Ceph cluster
Your 1st Ceph cluster
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Ceph Day Chicago - Ceph Deployment at Target: Best Practices and Lessons Learned

  • 1. Ceph Deployment at Target: Best Practices and Lessons Learned
  • 2. Agenda 2 Introduction Will Boege Sr. Technical Architect RAPID Team (Private Cloud and Automation)
  • 3. Agenda 3 @ First Ceph Environment at Target went live in October of 2014 • “Firefly” Release Ceph was backing Target’s first ‘official’ Openstack release • Icehouse Based • Ceph is used for: • RBD for Openstack Instances and Volumes • RADOSGW for Object (instead of Swift) • RBD backing Celiometer MongoDB volumes • Currently DEV is largest environment with ~1700 instances Replaced traditional array-based approach that was implemented in our prototype Havana environment. • Traditional storage model was problematic to integrate • Maintenance/purchase costs from array vendors can get prohibitive • Traditional SAN just doesn’t ‘feel’ right in this space. • Ceph’s tight integration with Openstack
  • 4. Agenda 4 @ Initial Ceph Deployment: • 3 x Monitor Nodes – Cisco B200 • 12 x OSD Nodes – Cisco C240 LFF • 12 4TB SATA Disks • 10 OSD per server • Journal partition co-located on each OSD disk • 120 OSD Total = ~ 400 TB • 2x 10GBE per host • 1 public_network • 1 cluster_network • Basic LSI ‘MegaRaid’ controller – SAS 2008M-8i • No supercap or cache capability onboard • 10xRAID0
  • 5. Post rollout it became evident that there were performance issues within the environment. • KitchenCI users would complain of slow Chef converge times • Yum transactions / app deployments would take abnormal amounts of time to complete. • Instance boot times, especially for images using cloud-init would take excessively long time to boot • General user griping about ‘slowness’ Lesson #1 - Instrument Your Deployment! Track statistics/ metrics that have real impact to your users
  • 6. Unacceptable levels of latency even while cluster was relatively unworked High levels of CPU IOWait% on the OSD servers & IDLE Openstack Instances Poor IOPS / Latency - FIO benchmarks running INSIDE Openstack Instances $ fio --rw=write --ioengine=libaio --runtime=100 --direct=1 --bs=4k --size=10G --iodepth=32 --name=/tmp/testfile.bin test: (groupid=0, jobs=1): err= 0: pid=1914 read : io=1542.5MB, bw=452383 B/s, iops=110 , runt=3575104msec write: io=527036KB, bw=150956 B/s, iops=36 , runt=3575104msec Having more effective instrumentation from the outset would have revealed obvious problems with our architecture
  • 7. Compounding the performance issues we began to see mysterious reliability issues. • OSDs would randomly fall offline • Cluster would enter a HEALTH_ERR state about once a week with ‘unfound objects’ and/or Inconsistent page groups that required manual intervention to fix. • These problems were usually coupled with a large drop in our already suspect performance levels Lesson #2 – Do your research on hardware your server vendor provides! Don’t just blindly accept whatever they had laying around, be proactive!
  • 8. • Root cause of HEALTH_ERRs was “unnamed vendor’s” SATA drives in our solution ‘soft-failing’ – slowly gaining media errors without reporting themselves as failed. Don’t rely on SMART. Interrogate your disks with a array-level tool, like MegaRAID to identify drives for proactive replacement. $ opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep Media • In installations with co-located journal partitions, a RAID solution with cache+BBU for writeback operation would have been a huge performance gain. Paying more attention to the suitability of hardware our vendor of choice provided would have saved a lot of headaches
  • 9. Which leads us to – Lesson #3 – Ceph is not magic. It does the best with the hardware you give it! Much ill-advised advice floating around that if you throw enough crappy disks at Ceph you will achieve enterprise grade performance. Garbage in – Garbage out. Don’t be greedy and build for capacity, if your objective is to create more a performant block storage solution.
  • 10. Agenda 1 0 New Ceph OSD Deployment: • 5 x OSD Nodes – Cisco C240M3 SFF • 18 10k Seagate SAS 1.1TB • 6 480g Intel S3500 SSD • I like Intel SSDs for use with Ceph. Huge disparity between SSD vendors performance. • 18 OSD per server • Journal partition on SSD with 4/5:1 OSD/Journal ratio • 90 OSD Total = ~ 100 TB • Improved LSI ‘MegaRaid’ controller – SAS-9271-8i • Supercap • Writeback capability • 18xRAID0 • Writethru on journals, writeback on spinning OSDs . • Still experimenting with this! – Writeback seems to help on systems without JBOD mode adapters! • UCS M4 Gen finally has cards that support LSI’s ‘ITMODE’ or JBOD! • Based on “Hammer” Ceph Release Lessons learned – we set out to rebuild
  • 11. • Obtaining metrics from our design change was nearly immediate due to having effective monitors in place – Latency improvements have been extreme – IOWait% within Openstack instances have been greatly reduced – Raw IOPS throughput has sykrocketed • Testing Celiometer backended by MongoDB on kRBD I’ve seen this 5 node / 90 OSD cluster spike to ~25k IOPS – Throughput testing with RADOS bench and FIO shows aprox. 10 fold increase – User feedback has been extremely positive, general Openstack experience at Target is much improved. – Performance within Openstack instances has increase about 10x Results test: (groupid=0, jobs=1): err= 0: pid=1914 read : io=1542.5MB, bw=452383 B/s, iops=110 , runt=3575104msec write: io=527036KB, bw=150956 B/s, iops=36 , runt=3575104msec test: (groupid=0, jobs=1): err= 0: pid=2131 read : io=2046.6MB, bw=11649KB/s, iops=2912 , runt=179853msec write: io=2049.1MB, bw=11671KB/s, iops=2917 , runt=179853msec
  • 12.
  • 13. • Before embarking on creating a Ceph environment, have a good idea of what your objectives are for the environment. – Capacity? – Performance? • If you make wrong decisions it can lead to a negative user perception of Ceph, and technologies that depend on it, like Openstack • Once you understand your objective, understand that your hardware selection is crucial to your success • Unless you are architecting for raw capacity, use SSDs for your journal volumes without exception – If you must co-locate journals, use a RAID adapter with BBU+Writeback cache • A hybrid approach may be feasible with SATA ‘capacity’ disks with SSD journals. I’ve yet to try this, I’d be interested in seeing some benchmark data on a setup like this • Research, experiment, consult with Red Hat / Inktank • Monitor, monitor, monitor and provide a very short feedback loop for your users to engage you with their concerns Conclusion
  • 14. • Looking to test all SSD pool performance – All SSD in Ceph has been maturing rapidly – We have needs for a ‘ultra’ Cinder tier for workloads that require high IOPS / low latency for use cases such as Kafka, Cassandra – Also considering Solidfire for this use case – If anyone has experience with this – I’d love to hear about it! • Repurposing legacy SATA hardware into a dedicated object pool – High capacity, low performance drives should work well in an object use case – more research is needed into end-user requirements • Automate deployment with Chef to bring parity with our Openstack automation • Broadening Ceph beyond cloud niche use case. Especially with improved object offering. • Repurpose ‘capacity’ frames – Video archiving for security camera footage – Register / POS log archiving Next Steps
  • 15. • Plan time into your deployment schedule to iron out dependancy hell, especially if you are moving from Inktank packages to Red Hat packages • In Hammer, you no longer have to use Apache and the FastCGI shim for RADOSGW object service. Enable civitweb with the following entry in the [client.radosgw.gateway] section of ceph.conf and make sure you shut off Apache! – rgw_frontends = "civetweb port=80” • Use new and improved CRUSH algorithm. This WILL trigger a lot of rebalancing activity! – $ ceph osd crush tunables optimal • In the [osd] section of ceph.conf set the following directive. This prevents new OSDs from triggering rebalancing. Nope, setting NOIN won’t do the trick! – osd_crush_update_on_start = false • Ceph’s default recovery settings are far too aggressive. Tone it down with the following in the [osd] section or it will impact client IO osd_max_backfills = 1 osd_recovery_priority = 1 osd_client_op_priority = 63 osd_recovery_max_active = 1 osd_recovery_max_single_start = 1 General Tips on Migrations/Upgrade to Hammer
  • 16. • Best method to ‘drain’ hosts is by adjusting the CRUSH weight of the OSDs on those hosts, NOT the OSD weight. – CRUSH weight dictates cluster-wide data distribution. OSD weights only impact the host the OSD is on and can cause unpredictability. • Don’t work serially host by host – Drop the CRUSH weight of all the OSDs you are removing across the cluster simultaneously. I used a ‘reduce by 50% and allow recovery’ scheme. Your mileage may vary. $ for i in {0..119}; do ceph osd crush reweight osd.$i 3.0; done $ for i in {0..119}; do ceph osd crush reweight osd.$i 1.5; done $ for i in {0..119}; do ceph osd crush reweight osd.$i .75; done • Consider numad to auto-magically set numa affinities. – Still experimenting with the impact of this on cluster performance. • Last but not least – VERY Important. You WILL run out of threads and OSDs WILL crash if you don’t tune the kernel.pid_max value – especially in servers with > 12 OSDs $ echo "kernel.pid_max = 4194303" >> /etc/sysctl.conf
  • 17. Thanks For Your Time! Questions? &