The design of Apache Cassandra allows applications to provide constant uptime. Peer-to-Peer technology ensures there are no single points of failure, and the Consistency guarantees allow applications to function correctly while some nodes are down. There is also a wealth of information provided by the JMX API and the system log. All of this means that when things go wrong you have the time, information and platform to resolve them without downtime. This presentation will cover some of the common, and not so common, performance issues, failures and management tasks observed in running clusters. Aaron will discuss how to gather information and how to act on it. Operators, Developers and Managers will all benefit from this exposition of Cassandra in the wild.
Cassandra Community Webinar | In Case of Emergency Break Glass
1. CASSANDRA COMMUNITY WEBINARS AUGUST 2013
IN CASE OF EMERGENCY,
BREAK GLASS
Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
2. AboutThe Last Pickle
Work with clients to deliver and improve
Apache Cassandra based solutions.
Apache Cassandra Committer, DataStax MVP,
Hector Maintainer, 6+ years combined
Cassandra experience.
Based in New Zealand & Austin,TX.
39. ParNew GC Starting
{Heap before GC invocations=224115 (full 111):
par new generation total 873856K, used 717289K ...)
eden space 699136K, 100% used ...)
from space 174720K, 10% used ...)
to space 174720K, 0% used ...)
www.thelastpickle.com
40. Tenuring Distribution
240217.053: [ParNew
Desired survivor size 89456640 bytes, new threshold 4 (max 4)
- age 1: 22575936 bytes, 22575936 total
- age 2: 350616 bytes, 22926552 total
- age 3: 4380888 bytes, 27307440 total
- age 4: 1155104 bytes, 28462544 total
www.thelastpickle.com
41. ParNew GC Finishing
Heap after GC invocations=224116 (full 111):
par new generation total 873856K, used 31291K ...)
eden space 699136K, 0% used ...)
from space 174720K, 17% used ...)
to space 174720K, 0% used ...)
www.thelastpickle.com
42. nodetool info
Token : 0
Gossip active : true
Load : 130.64 GB
Generation No : 1369334297
Uptime (seconds) : 29438
Heap Memory (MB) : 3744.27 / 8025.38
Data Center : east
Rack : rack1
Exceptions : 0
Key Cache : size 104857584 (bytes), capacity 104857584
(bytes), 25364985 hits, 34874180 requests, 0.734 recent hit
rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0...
www.thelastpickle.com
43. nodetool ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Status State Load Owns Token
10.1.64.11 east rack1 Up Normal 130.64 GB 12.50% 0
10.1.65.8 west rack1 Up Normal 88.79 GB 0.00% 1
10.1.64.78 east rack1 Up Normal 52.66 GB 12.50% 212...216
10.1.65.181 west rack1 Up Normal 65.99 GB 0.00% 212...217
10.1.66.8 east rack1 Up Normal 64.38 GB 12.50% 425...432
10.1.65.178 west rack1 Up Normal 77.94 GB 0.00% 425...433
10.1.64.201 east rack1 Up Normal 56.42 GB 12.50% 638...648
10.1.65.59 west rack1 Up Normal 74.5 GB 0.00% 638...649
10.1.64.235 east rack1 Up Normal 79.68 GB 12.50% 850...864
10.1.65.16 west rack1 Up Normal 62.05 GB 0.00% 850...865
10.1.66.227 east rack1 Up Normal 106.73 GB 12.50% 106...080
10.1.65.226 west rack1 Up Normal 79.26 GB 0.00% 106...081
10.1.66.247 east rack1 Up Normal 66.68 GB 12.50% 127...295
10.1.65.19 west rack1 Up Normal 102.45 GB 0.00% 127...297
10.1.66.141 east rack1 Up Normal 53.72 GB 12.50% 148...512
10.1.65.253 west rack1 Up Normal 54.25 GB 0.00% 148...513
www.thelastpickle.com
44. nodetool ring KS1
Address DC Rack Status State Load Effective-Ownership Token
10.1.64.11 east rack1 Up Normal 130.72 GB 12.50% 0
10.1.65.8 west rack1 Up Normal 88.81 GB 12.50% 1
10.1.64.78 east rack1 Up Normal 52.68 GB 12.50% 212...216
10.1.65.181 west rack1 Up Normal 66.01 GB 12.50% 212...217
10.1.66.8 east rack1 Up Normal 64.4 GB 12.50% 425...432
10.1.65.178 west rack1 Up Normal 77.96 GB 12.50% 425...433
10.1.64.201 east rack1 Up Normal 56.44 GB 12.50% 638...648
10.1.65.59 west rack1 Up Normal 74.57 GB 12.50% 638...649
10.1.64.235 east rack1 Up Normal 79.72 GB 12.50% 850...864
10.1.65.16 west rack1 Up Normal 62.12 GB 12.50% 850...865
10.1.66.227 east rack1 Up Normal 106.72 GB 12.50% 106...080
10.1.65.226 west rack1 Up Normal 79.28 GB 12.50% 106...081
10.1.66.247 east rack1 Up Normal 66.73 GB 12.50% 127...295
10.1.65.19 west rack1 Up Normal 102.47 GB 12.50% 127...297
10.1.66.141 east rack1 Up Normal 53.75 GB 12.50% 148...512
10.1.65.253 west rack1 Up Normal 54.24 GB 12.50% 148...513
www.thelastpickle.com
45. nodetool status
$ nodetool status
Datacenter: ams01 (Replication Factor 3)
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.70.48.23 38.38 GB 256 19.0% 7c5fdfad-63c6-4f37-bb9f-a66271aa3423 RAC1
UN 10.70.6.78 58.13 GB 256 18.3% 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65 RAC1
UN 10.70.47.126 53.89 GB 256 19.4% f36f1f8c-1956-4850-8040-b58273277d83 RAC1
Datacenter: wdc01 (Replication Factor 3)
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.24.116.66 65.81 GB 256 22.1% f9dba004-8c3d-4670-94a0-d301a9b775a8 RAC1
UN 10.55.104.90 63.31 GB 256 21.2% 4746f1bd-85e1-4071-ae5e-9c5baac79469 RAC1
UN 10.55.104.27 62.71 GB 256 21.2% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 RAC1
www.thelastpickle.com
46. nodetool cfstats
Keyspace: KS1
Column Family: CF1
SSTable count: 11
Space used (live): 32769179336
Space used (total): 32769179336
Number of Keys (estimate): 73728
Memtable Columns Count: 1069137
Memtable Data Size: 216442624
Memtable Switch Count: 3
Read Count: 95
Read Latency: NaN ms.
Write Count: 1039417
Write Latency: 0.068 ms.
Bloom Filter False Postives: 345
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 230096
Compacted row minimum size: 150
Compacted row maximum size: 322381140
Compacted row mean size: 2072156
www.thelastpickle.com
56. Compaction Error
ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java
(line 132) Exception in thread Thread[CompactionExecutor:36,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(138024912283272996716128964353306009224, 6138633035613062
2d616666362d376330612d666531662d373738616630636265396535) >= current key
DecoratedKey(127065377405949402743383718901402082101,
64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing
into *-tmp-hf-7372-Data.db
at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160)
at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompaction
Task.java:50)
at org.apache.cassandra.db.compaction.CompactionManager
$2.runMayThrow(CompactionManager.java:164)
www.thelastpickle.com
68. jmx-term
$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies
#bean is set to
org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies
$>get BloomFilterFalseRatio
#mbean =
org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies:
BloomFilterFalseRatio = 0.5693801541828607;
www.thelastpickle.com
69. Back to cfstats
Column Family: page_views
Read Count: 270075
Bloom Filter False Positives: 131294
www.thelastpickle.com
71. Fix
Changed read queries to select by column
name to limit SSTables per query.
Long term, migrate to Cassandra v1.2 for off
heap Bloom Filters.
www.thelastpickle.com
73. WARN
WARN [ScheduledTasks:1] 2013-03-29 18:40:48,158
GCInspector.java (line 145) Heap is 0.9355130159566108 full.
You may need to reduce memtable and/or cache sizes.
INFO [ScheduledTasks:1] 2013-03-26 16:36:06,383
GCInspector.java (line 122) GC for ConcurrentMarkSweep: 207 ms
for 1 collections, 10105891032 used; max is 13591642112
INFO [ScheduledTasks:1] 2013-03-28 22:18:17,113
GCInspector.java (line 122) GC for ParNew: 256 ms for 1
collections, 6504905688 used; max is 13591642112
www.thelastpickle.com
74. Serious GC Problems
INFO [ScheduledTasks:1] 2013-04-30 23:21:11,959
GCInspector.java (line 122) GC for ParNew: 1115 ms for 1
collections, 9355247296 used; max is 12801015808
www.thelastpickle.com
75. Flapping Node
INFO [GossipTasks:1] 2013-03-28 17:42:07,944 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
INFO [GossipStage:1] 2013-03-28 17:42:54,740 Gossiper.java
(line 816) InetAddress /10.1.20.144 is now UP
INFO [GossipTasks:1] 2013-03-28 17:46:00,585 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
INFO [GossipStage:1] 2013-03-28 17:46:13,855 Gossiper.java
(line 816) InetAddress /10.1.20.144 is now UP
INFO [GossipStage:1] 2013-03-28 17:48:48,966 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
www.thelastpickle.com
76. “GC Problems are the result
of workload and
configuration.”
Aaron Morton, Just Now.
www.thelastpickle.com
77. Workload Correlation?
Look for wide rows, large
writes, wide reads, un-
bounded multi row reads or
writes.
www.thelastpickle.com
78. Compaction Correlation?
Slow down Compaction to improve stability.
concurrent_compactors: 2
compaction_throughput_mb_per_sec: 8
in_memory_compaction_limit_in_mb: 32
(Monitor and reverse when resolved.)
www.thelastpickle.com
79. GC Logging Insights
Slow down rate of tenuring and enable full
GC logging.
HEAP_NEWSIZE="1200M"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"
www.thelastpickle.com
80. GC’ing Objects in ParNew
{Heap before GC invocations=7937 (full 205):
par new generation total 1024000K, used 830755K ...)
eden space 819200K, 100% used ...)
from space 204800K, 5% used ...)
to space 204800K, 0% used ...)
Desired survivor size 104857600 bytes, new threshold 4 (max 4)
- age 1: 8090240 bytes, 8090240 total
- age 2: 565016 bytes, 8655256 total
- age 3: 330152 bytes, 8985408 total
- age 4: 657840 bytes, 9643248 total
www.thelastpickle.com
81. GC’ing Objects in ParNew
{Heap before GC invocations=7938 (full 205):
par new generation total 1024000K, used 835015K ...)
eden space 819200K, 100% used ...)
from space 204800K, 7% used ...)
to space 204800K, 0% used ...)
Desired survivor size 104857600 bytes, new threshold 4 (max 4)
- age 1: 1315072 bytes, 1315072 total
- age 2: 541072 bytes, 1856144 total
- age 3: 499432 bytes, 2355576 total
- age 4: 316808 bytes, 2672384 total
www.thelastpickle.com
82. Cause
Nodes had wide rows & 1.3+
Billion rows and 3+GB of
Bloom Filters.
(Using older bloom_filter_fp_chance of 0.000744.)
www.thelastpickle.com
83. Fix
Increased FP chance to 0.1 on
one CF’s and .01 on others.
(One CF reduced from 770MB to 170MB of Bloom Filters.)
www.thelastpickle.com
92. Changing the Snitch
Do Not change the DC or
Rack for an existing node.
(Cassandra will not be able to find your data.)
www.thelastpickle.com
93. Moving to the GossipingPropertyFileSnitch
Update cassandra-
topology.properties
on existing nodes with existing DC/Rack
settings for all existing nodes.
Set default to new DC.
www.thelastpickle.com
94. Moving to the GossipingPropertyFileSnitch
Update cassandra-
rackdc.properties
on existing nodes with existing DC/Rack for
the node.
www.thelastpickle.com
95. Moving to the GossipingPropertyFileSnitch
Use a rolling restart to upgrade existing nodes
to GossipingPropertyFileSnitch
www.thelastpickle.com
96. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
www.thelastpickle.com
97. Got NTS ?
Must use
NetworkTopologyStrategy
for Multi DC deployments.
www.thelastpickle.com
100. NetworkTopologyStrategy
Order Token Ranges in the DC.
Start with range that contains the Row Key.
Add first unselected Token Range from each
Rack.
Repeat until RF selected.
www.thelastpickle.com
103. Changing the Replication Strategy
Be Careful if existing
configuration has multiple
Racks.
(Cassandra may not be able to find your data.)
www.thelastpickle.com
104. Changing the Replication Strategy
Update Keyspace configuration to use
NetworkTopologyStrategy with
datacenter1:3 and new_dc:0.
www.thelastpickle.com
105. PreparingThe Client
Disable auto node discovery or use DC
aware methods.
Use LOCAL_QUOURM or EACH_QUOURM.
www.thelastpickle.com
106. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
www.thelastpickle.com
107. Configuring New Nodes
Add auto_bootstrap: false to
cassandra.yaml.
Use GossipingPropertyFileSnitch.
Three Seeds from each DC.
(Use cluster_name as a safety.)
www.thelastpickle.com
108. Configuring New Nodes
Update cassandra-
rackdc.properties
on new nodes with new DC/Rack for the
node.
(Ignore cassandra-topology.properties)
www.thelastpickle.com
109. StartThe New Nodes
New Nodes in the Ring in the
new DC without data or
traffic.
www.thelastpickle.com
110. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
www.thelastpickle.com
111. Change the Replication Factor
Update Keyspace configuration to use
NetworkTopologyStrategy with
dataceter1:3 and new_dc:3.
www.thelastpickle.com
112. Change the Replication Factor
New DC nodes will start
receiving writes from old DC
coordinators.
www.thelastpickle.com
113. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
www.thelastpickle.com
114. Y U No Bootstrap?
DC 1 DC 2
www.thelastpickle.com
119. Aaron Morton
@aaronmorton
Co-Founder & Principal Consultant
www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License