SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
©2014 DataStax Confidential. Do not distribute without consent.
@AlTobey
Open Source Mechanic @ Datastax
Designing Commodity Storage
1
What is commodity storage?
•software-defined storage
•e.g. Cassandra, S3, GCE Persistent Disks
•Intel/AMD x86_64 architecture
!
Open Standards:
•PCI-Express
•Near-line SAS, Enterprise SATA, SATA SSD
•1g/10g ethernet
Definitely NOT this
Designed to solve
different problems
from a different era.
Not this either
Besides SSDs most “desktop”
gear is to be avoided for
production deployment.
Enterprise
Rack & Stack
•Blades & 1U for high CPU with low storage density
•2U for plenty of CPU & storage & air flow
•3U-4U for high-latency / high-density storage
•“racks” don’t have to be literal
•blade chassis
•separate network/power is key
Vendors
Choosing Server Components
•CPU
•Memory
•Motherboards
•Host Bus Adapters
•Hard Drives
•Network Interface Cards
CPU Pricing
E5-2620
E5-2630
E5-2650
E5-2670
E5-2687W
E5-2690
0 550 1100 1650 2200
6 cores 2.6Ghz 80w
6 cores 2.1Ghz 80w
8 cores 2.6Ghz 95w
10 cores 2.5Ghz 115w (3.3Ghz turbo)
8 cores 3.4Ghz 150w
8 cores 2.9Ghz 135w (3.8Ghz turbo)
Dollars
15MB L3 Cache
15MB
20MB
20MB
25MB
25MB
Processors
Source: http://en.wikipedia.org/wiki/Sandy_Bridge-E
Memory
•always get ECC!
•~5 single bit errors in 8 GB RAM per hour (top-end error rate)
•unexplainable crashes
•data corruption
•8GB DIMMs are still the sweet spot
!
•Registered Memory: match to your CPU/motherboard
•Pretty much all server memory is ECC and Registered
!
•Speed: match to fastest rating of CPU/motherboard
Motherboards
•Largely out of your control
•Dell / HP / etc. you’re looking at server model, e.g. DL380
•Supermicro: be very careful when picking your VAR
•Features to watch for:
•Socket count (NUMA)
•IPMI
•onboard SAS or SATA port speed/count
•PCIe speed & layout
•RAM capacity
Storage Adapters
•Serial Attached SCSI
•Bit Error Rate: 1 in 10^16 bits or 1bit in 1,250TiB
•Supports SATA drives over STP
•Near-line SAS drives are SATA chassis with SAS boards
•Always use SAS if you need an expander
•Check out enclosure services in Linux
•Serial ATA
•Bit Error Rate: 1 in 10^15 or 1 bit in 125 TiB
•Avoid expanders
Storage Adapters
•JBOD
•cheap
•OS manages drives
•drivers usually shipped with OS
•CPU overhead is negligible
•HW RAID is sometimes faster, usually comes with cache
•writethrough v.s. writeback
•writeback + BBU provides interesting performance options
•driver + utilities management
Parity RAID
RAID
•JBOD
•mount every drive with individual filesystems
•cheap
•RAID0
•single drive failure means node rebuild
•cheap
•RAID10
•fast, protects against single disk failure
•expensive
RAID
•RAID 5 / 6 (and beyond)
•parity data protection
•performance heavily dependent on implementation
•cheapest option for drive failure protection
•RAID 50 / 60
•stripe across multiple RAID[56] volumes
•mostly useful with large number of drives
•can provide decent performance esp. on HW RAID
Hard Drives
•SATA HDD
•there’s only one head carriage
•seeks kill
•decent performance on sequential IO
•bit errors
•cheap!
Hard Drives
•SAS HDD
•there’s only one head carriage
•seeks kill
•bit errors
•expensive!
•faster RPMs may help a little with seek latency
Hard Drives
•SATA SSD
•very low latency seeks
•slightly lower sequential IO throughput
•more expensive than SATA HDD
•vendors might not want to sell them to you!
•sometimes called “value series” or similar
•Cassandra runs fine on consumer-grade SSDs
•make sure your SATA/SAS bus and HBA are up to the task
Hard Drives
•Enterprise SSD
•quite expensive
•vendor supported
•more reliable
•often faster as well
Hard Drives
•PCIe SSD
•e.g. FusionIO, ioSwitch
•highest performance potential
•not as expensive as you think
•lots of new products entering the market
•generally not hot-swappable
Networking
•you don’t need 10gig
•but it’s awesome
•Broadcom cards are common and commonly buggy
•Intel cards are expensive but a good bet
•Consider lesser-known add-in cards, e.g. Myricom
To the Cloud!
•Amazon, Google, etc. all use similar gear under the VM
•same constraints apply, but you only get a fraction of the box
•pass-through PCIe devices for the best performance
•Avoid EBS in EC2, go with ephemerals
•GCE PD’s may need additional read/write threads
@AlTobey
Q & A
Everybody is hiring, including Datastax!
Open Source Mechanic, Datastax

Contenu connexe

Tendances

Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...
Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...
Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...Joseph Brunner
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...DataStax Academy
 
Evoluzione dello storage
Evoluzione dello storageEvoluzione dello storage
Evoluzione dello storageAndrea Mauro
 
VMware Virtual SAN slideshow
VMware Virtual SAN slideshowVMware Virtual SAN slideshow
VMware Virtual SAN slideshowAshley Williams
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Community
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3UniFabric
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCCeph Community
 
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybrides
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybridesEric Moreau - Samedi SQL - Backup dans Azure et BD hybrides
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybridesMSDEVMTL
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Community
 
Raid data recovery Tips
Raid data recovery TipsRaid data recovery Tips
Raid data recovery TipsHone Software
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSPythian
 
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Unidesk Corporation
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsCeph Community
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuMarco Obinu
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Community
 

Tendances (19)

Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...
Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...
Affirmed Systems SSD Storage Area Network Appliance architecture for trading ...
 
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware B...
 
Evoluzione dello storage
Evoluzione dello storageEvoluzione dello storage
Evoluzione dello storage
 
Raid level 4
Raid level 4Raid level 4
Raid level 4
 
Understanding RAID Controller
Understanding RAID ControllerUnderstanding RAID Controller
Understanding RAID Controller
 
VMware Virtual SAN slideshow
VMware Virtual SAN slideshowVMware Virtual SAN slideshow
VMware Virtual SAN slideshow
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3SOUG_SDM_OracleDB_V3
SOUG_SDM_OracleDB_V3
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybrides
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybridesEric Moreau - Samedi SQL - Backup dans Azure et BD hybrides
Eric Moreau - Samedi SQL - Backup dans Azure et BD hybrides
 
Ceph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFSCeph Day San Jose - HA NAS with CephFS
Ceph Day San Jose - HA NAS with CephFS
 
Raid data recovery Tips
Raid data recovery TipsRaid data recovery Tips
Raid data recovery Tips
 
TechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWSTechTalk v2.0 - Performance tuning Cassandra + AWS
TechTalk v2.0 - Performance tuning Cassandra + AWS
 
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
Get Your GeekOn With Ron - Session Two: Local Storage vs Centralized Storage ...
 
Developing a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure EnvironmentsDeveloping a Ceph Appliance for Secure Environments
Developing a Ceph Appliance for Secure Environments
 
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco ObinuAzure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
Azure VM 101 - HomeGen by CloudGen Verona - Marco Obinu
 
Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong Ceph Day Seoul - Ceph: a decade in the making and still going strong
Ceph Day Seoul - Ceph: a decade in the making and still going strong
 

Similaire à Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - ThailandServers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - ThailandAruj Thirawat
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Colin Charles
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017Ivan Zoratti
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesNETWAYS
 
Storage (Hard disk drive)
Storage (Hard disk drive)Storage (Hard disk drive)
Storage (Hard disk drive)0949778108
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageAidan Finn
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flashxKinAnx
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisMike Pittaro
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems Baruch Osoveskiy
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceMarketingArrowECS_CZ
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellentjyoti_j2
 
OSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOpenStorageSummit
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix
 
robust-storage-solution
robust-storage-solutionrobust-storage-solution
robust-storage-solutionTecsun Yeep
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareNetgear Italia
 

Similaire à Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra (20)

Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - ThailandServers Technologies and Enterprise Data Center Trends 2014 - Thailand
Servers Technologies and Enterprise Data Center Trends 2014 - Thailand
 
Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016Tuning Linux for your database FLOSSUK 2016
Tuning Linux for your database FLOSSUK 2016
 
MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017MySQL Performance Tuning London Meetup June 2017
MySQL Performance Tuning London Meetup June 2017
 
A better storage solution
A better storage solutionA better storage solution
A better storage solution
 
OSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin CharlesOSDC 2016 - Tuning Linux for your Database by Colin Charles
OSDC 2016 - Tuning Linux for your Database by Colin Charles
 
SSD PPT BY SAURABH
SSD PPT BY SAURABHSSD PPT BY SAURABH
SSD PPT BY SAURABH
 
Storage (Hard disk drive)
Storage (Hard disk drive)Storage (Hard disk drive)
Storage (Hard disk drive)
 
Windows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined StorageWindows Server 2012 R2 Software-Defined Storage
Windows Server 2012 R2 Software-Defined Storage
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Presentation database on flash
Presentation   database on flashPresentation   database on flash
Presentation database on flash
 
High Performance Hardware for Data Analysis
High Performance Hardware for Data AnalysisHigh Performance Hardware for Data Analysis
High Performance Hardware for Data Analysis
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
 
Oracle Performance On Linux X86 systems
Oracle  Performance On Linux  X86 systems Oracle  Performance On Linux  X86 systems
Oracle Performance On Linux X86 systems
 
Výhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database ApplianceVýhody a benefity nasazení Oracle Database Appliance
Výhody a benefity nasazení Oracle Database Appliance
 
1 emc vs_compellent
1 emc vs_compellent1 emc vs_compellent
1 emc vs_compellent
 
OSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel BeveridgeOSS Presentation Accelerating VDI by Daniel Beveridge
OSS Presentation Accelerating VDI by Daniel Beveridge
 
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
Citrix Synergy 2014 - Syn233 Building and operating a Dev Ops cloud: best pra...
 
robust-storage-solution
robust-storage-solutionrobust-storage-solution
robust-storage-solution
 
Webinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e softwareWebinar NETGEAR - ReadyNAS, le novità hardware e software
Webinar NETGEAR - ReadyNAS, le novità hardware e software
 

Plus de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 

Plus de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 

Dernier

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Dernier (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra

  • 1. ©2014 DataStax Confidential. Do not distribute without consent. @AlTobey Open Source Mechanic @ Datastax Designing Commodity Storage 1
  • 2. What is commodity storage? •software-defined storage •e.g. Cassandra, S3, GCE Persistent Disks •Intel/AMD x86_64 architecture ! Open Standards: •PCI-Express •Near-line SAS, Enterprise SATA, SATA SSD •1g/10g ethernet
  • 3. Definitely NOT this Designed to solve different problems from a different era.
  • 4. Not this either Besides SSDs most “desktop” gear is to be avoided for production deployment.
  • 6.
  • 7. Rack & Stack •Blades & 1U for high CPU with low storage density •2U for plenty of CPU & storage & air flow •3U-4U for high-latency / high-density storage •“racks” don’t have to be literal •blade chassis •separate network/power is key
  • 9. Choosing Server Components •CPU •Memory •Motherboards •Host Bus Adapters •Hard Drives •Network Interface Cards
  • 10. CPU Pricing E5-2620 E5-2630 E5-2650 E5-2670 E5-2687W E5-2690 0 550 1100 1650 2200 6 cores 2.6Ghz 80w 6 cores 2.1Ghz 80w 8 cores 2.6Ghz 95w 10 cores 2.5Ghz 115w (3.3Ghz turbo) 8 cores 3.4Ghz 150w 8 cores 2.9Ghz 135w (3.8Ghz turbo) Dollars 15MB L3 Cache 15MB 20MB 20MB 25MB 25MB
  • 12.
  • 13. Memory •always get ECC! •~5 single bit errors in 8 GB RAM per hour (top-end error rate) •unexplainable crashes •data corruption •8GB DIMMs are still the sweet spot ! •Registered Memory: match to your CPU/motherboard •Pretty much all server memory is ECC and Registered ! •Speed: match to fastest rating of CPU/motherboard
  • 14. Motherboards •Largely out of your control •Dell / HP / etc. you’re looking at server model, e.g. DL380 •Supermicro: be very careful when picking your VAR •Features to watch for: •Socket count (NUMA) •IPMI •onboard SAS or SATA port speed/count •PCIe speed & layout •RAM capacity
  • 15. Storage Adapters •Serial Attached SCSI •Bit Error Rate: 1 in 10^16 bits or 1bit in 1,250TiB •Supports SATA drives over STP •Near-line SAS drives are SATA chassis with SAS boards •Always use SAS if you need an expander •Check out enclosure services in Linux •Serial ATA •Bit Error Rate: 1 in 10^15 or 1 bit in 125 TiB •Avoid expanders
  • 16. Storage Adapters •JBOD •cheap •OS manages drives •drivers usually shipped with OS •CPU overhead is negligible •HW RAID is sometimes faster, usually comes with cache •writethrough v.s. writeback •writeback + BBU provides interesting performance options •driver + utilities management
  • 17.
  • 18.
  • 20.
  • 21. RAID •JBOD •mount every drive with individual filesystems •cheap •RAID0 •single drive failure means node rebuild •cheap •RAID10 •fast, protects against single disk failure •expensive
  • 22. RAID •RAID 5 / 6 (and beyond) •parity data protection •performance heavily dependent on implementation •cheapest option for drive failure protection •RAID 50 / 60 •stripe across multiple RAID[56] volumes •mostly useful with large number of drives •can provide decent performance esp. on HW RAID
  • 23.
  • 24. Hard Drives •SATA HDD •there’s only one head carriage •seeks kill •decent performance on sequential IO •bit errors •cheap!
  • 25.
  • 26. Hard Drives •SAS HDD •there’s only one head carriage •seeks kill •bit errors •expensive! •faster RPMs may help a little with seek latency
  • 27.
  • 28. Hard Drives •SATA SSD •very low latency seeks •slightly lower sequential IO throughput •more expensive than SATA HDD •vendors might not want to sell them to you! •sometimes called “value series” or similar •Cassandra runs fine on consumer-grade SSDs •make sure your SATA/SAS bus and HBA are up to the task
  • 29. Hard Drives •Enterprise SSD •quite expensive •vendor supported •more reliable •often faster as well
  • 30.
  • 31. Hard Drives •PCIe SSD •e.g. FusionIO, ioSwitch •highest performance potential •not as expensive as you think •lots of new products entering the market •generally not hot-swappable
  • 32.
  • 33. Networking •you don’t need 10gig •but it’s awesome •Broadcom cards are common and commonly buggy •Intel cards are expensive but a good bet •Consider lesser-known add-in cards, e.g. Myricom
  • 34. To the Cloud! •Amazon, Google, etc. all use similar gear under the VM •same constraints apply, but you only get a fraction of the box •pass-through PCIe devices for the best performance •Avoid EBS in EC2, go with ephemerals •GCE PD’s may need additional read/write threads
  • 35. @AlTobey Q & A Everybody is hiring, including Datastax! Open Source Mechanic, Datastax