SlideShare une entreprise Scribd logo
1  sur  59
Cassandra troubleshooting:
    out of the shadows

      Benjamin Black, b@b3k.us
Introducing: This Guy
The Allegory of the Cave
Most people start
troubleshooting problems
interpreting shadows on the
wall.
Common shadows.
Paths out of the cave.
Combination of
basic system tools
&
nodetool/JMX
I’m using RP.
My ring is very unbalanced.
I’m using RP.
My ring is very unbalanced.




                   WTF?
nodetool ring
Address   Status   Load       Range                  Ring
                      148873535527910577765226390751398592511
10.248.54.192 Up      5.59 GB                       0 |<--|
10.248.254.15 Up     10.58 GB     42535295865117307932921825928971026431 | ^
10.248.135.239Up      11.01 GB     85070591730234615865843651857942052863 v |
10.248.223.191Up       5.42 GB   106338239662793269832304564822427566079 | ^
10.248.122.240Up       5.51 GB   127605887595351923798765477786913079295 v |
10.248.34.80 Up      5.45 GB    148873535527910577765226390751398592511 |-->|
Address   Status   Load       Range                  Ring
                      148873535527910577765226390751398592511
10.248.54.192 Up      5.59 GB                       0 |<--|
10.248.254.15 Up     10.58 GB     42535295865117307932921825928971026431 | ^
10.248.135.239Up      11.01 GB     85070591730234615865843651857942052863 v |
10.248.223.191Up       5.42 GB   106338239662793269832304564822427566079 | ^
10.248.122.240Up       5.51 GB   127605887595351923798765477786913079295 v |
10.248.34.80 Up      5.45 GB    148873535527910577765226390751398592511 |-->|
Autobootstrap
+
Automatic token assignment
Automatic token algorithm:

Assign a token that will give me
half the range of
the most loaded node.
32
16 16
8 8 16
8888
44888
444488
Address   Status   Load       Range                  Ring
                      148873535527910577765226390751398592511
10.248.54.192 Up      5.59 GB                       0 |<--|
10.248.254.15 Up     10.58 GB     42535295865117307932921825928971026431 | ^
10.248.135.239Up      11.01 GB     85070591730234615865843651857942052863 v |
10.248.223.191Up       5.42 GB   106338239662793269832304564822427566079 | ^
10.248.122.240Up       5.51 GB   127605887595351923798765477786913079295 v |
10.248.34.80 Up      5.45 GB    148873535527910577765226390751398592511 |-->|
nodetool move
+
Manual token assignment
0-(2**127 - 1)
def tokens(nodes)
 0.upto(nodes - 1) do |n|
  p (n * (2**127 - 1) / nodes)
 end
end
=> tokens(6)
0
283568639100782052886145506193140
17621
567137278201564105772291012386280
35242
850705917302346158658436518579420
52863
113427455640312821154458202477256
070484
141784319550391026443072753096570
088105
YES:

This means you need to change tokens on
most of the nodes in your cluster whenever
you add a node.
Writes are fast.
Reads keep getting slower.
Writes are fast.
Reads keep getting slower.




                   WTF?
iostat -x
look at %util
nodetool tpstats
Pool Name                    Active   Pending      Completed
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         516280
ROW-READ-STAGE                    8      4096        1164326
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1    682008        1818682
GMFD                              0         0           6467
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0         661477
ROW-MUTATION-STAGE                0         0         998780
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              4
FLUSH-WRITER-POOL                 0         0              4
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               0         0              3
Pool Name                    Active   Pending      Completed
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         516280
ROW-READ-STAGE                    8      4096        1164326
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1    682008        1818682
GMFD                              0         0           6467
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0         661477
ROW-MUTATION-STAGE                0         0         998780
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              4
FLUSH-WRITER-POOL                 0         0              4
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               0         0              3
Pool Name                    Active   Pending      Completed
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         516280
ROW-READ-STAGE                    8      4096        1164326
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1    682008        1818682
GMFD                              0         0           6467
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0         661477
ROW-MUTATION-STAGE                0         0         998780
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              4
FLUSH-WRITER-POOL                 0         0              4
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               0         0              3
Pool Name                    Active   Pending      Completed
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         516280
ROW-READ-STAGE                    8      4096        1164326
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1    682008        1818682
GMFD                              0         0           6467
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0         661477
ROW-MUTATION-STAGE                0         0         998780
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              4
FLUSH-WRITER-POOL                 0         0              4
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               0         0              3
YOU ARE OUT OF
DISK BANDWIDTH
You can:
Throttle reads at clients
Adjust memtable settings
    (size/ops/time)
Less frequent memtable flush

 Less frequent compaction

Less disk bandwidth demand
Add more nodes
Add more spindles per node
Switch to SSDs
I inserted a bunch of data.
Now my nodes are flapping.
I inserted a bunch of data.
Now my nodes are flapping.




                  WTF?
iostat -x
look at %util
vmstat
look at swap
INFO 13:27:35,309
DiskAccessMode 'auto' determined to be mmap,
indexAccessMode is mmap
mmap() in Cassandra
consumes up to 2GB.
mmap() in Cassandra
consumes up to 2GB.
Per segment.
NOT tracked as JVM heap.




           *See: https://issues.apache.org/jira/browse/CASSANDRA-1214
NOT tracked as JVM heap.
JVM heap not locked in
memory.

           *See: https://issues.apache.org/jira/browse/CASSANDRA-1214
When your data set exceeds
memory,
this is likely.
Swapping can delay gossip
long enough to cause a
node to be marked down.
<DiskAccessMode>mmap_index_only</
DiskAccessMode>
or
disk_access_mode: mmap_index_only
On Linux: swappiness=0
INFO 13:27:35,309
DiskAccessMode isstandard,
indexAccessMode is mmap
Most people start
troubleshooting problems
interpreting shadows on the
wall.
You can now see the path
and the sunlight outside.
YOU CAN HELP!
What things have confused
you?
What problems have you
solved?
What tools have you used to
solve them?
GET INVOLVED!
http://wiki.apache.org/cassandra
#cassandra on freenode

Contenu connexe

Similaire à Cassandra Summit 2010 - Operations & Troubleshooting Intro

Polardb percona-19
Polardb percona-19Polardb percona-19
Polardb percona-19宗志 陈
 
Bảng giá cadisun năm 2017
Bảng giá cadisun năm 2017 Bảng giá cadisun năm 2017
Bảng giá cadisun năm 2017 sutviet
 
I pv6 dhcp
I pv6 dhcpI pv6 dhcp
I pv6 dhcpeufronio
 
Elecon Wrom Gear Catalogue| Sumit Engineers
Elecon  Wrom Gear Catalogue| Sumit EngineersElecon  Wrom Gear Catalogue| Sumit Engineers
Elecon Wrom Gear Catalogue| Sumit EngineersSUMITCHUGH8
 
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14Jayesh Thakrar
 
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...Swagatam Mitra
 
Automated Interpretation of Wireline and LWD Formation Testing Dynamic Data
Automated Interpretation of Wireline and LWD Formation Testing Dynamic DataAutomated Interpretation of Wireline and LWD Formation Testing Dynamic Data
Automated Interpretation of Wireline and LWD Formation Testing Dynamic DataSociety of Petroleum Engineers
 
Calculation template
Calculation templateCalculation template
Calculation templateYekian Ian
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPFIvan Babrou
 
K lateral placas
K lateral placasK lateral placas
K lateral placashugo villa
 
Ejercicio metodo simplificado de edificio de 4 niveles
Ejercicio metodo simplificado de edificio de 4 nivelesEjercicio metodo simplificado de edificio de 4 niveles
Ejercicio metodo simplificado de edificio de 4 niveleshooke007
 
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motor
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motorac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motor
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motorLance Shi NER GROUP
 

Similaire à Cassandra Summit 2010 - Operations & Troubleshooting Intro (20)

Polardb percona-19
Polardb percona-19Polardb percona-19
Polardb percona-19
 
Tabla de resultados
Tabla de resultadosTabla de resultados
Tabla de resultados
 
Geometry Commands
Geometry CommandsGeometry Commands
Geometry Commands
 
ECE469 proj2_Lakshmi Yasaswi Kamireddy
ECE469 proj2_Lakshmi Yasaswi KamireddyECE469 proj2_Lakshmi Yasaswi Kamireddy
ECE469 proj2_Lakshmi Yasaswi Kamireddy
 
Linux networking
Linux networkingLinux networking
Linux networking
 
Bảng giá cadisun năm 2017
Bảng giá cadisun năm 2017 Bảng giá cadisun năm 2017
Bảng giá cadisun năm 2017
 
I pv6 dhcp
I pv6 dhcpI pv6 dhcp
I pv6 dhcp
 
Elecon Wrom Gear Catalogue| Sumit Engineers
Elecon  Wrom Gear Catalogue| Sumit EngineersElecon  Wrom Gear Catalogue| Sumit Engineers
Elecon Wrom Gear Catalogue| Sumit Engineers
 
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
Chicago-Java-User-Group-Meetup-Some-Garbage-Talk-2015-01-14
 
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...
 
Automated Interpretation of Wireline and LWD Formation Testing Dynamic Data
Automated Interpretation of Wireline and LWD Formation Testing Dynamic DataAutomated Interpretation of Wireline and LWD Formation Testing Dynamic Data
Automated Interpretation of Wireline and LWD Formation Testing Dynamic Data
 
Calculation template
Calculation templateCalculation template
Calculation template
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
K lateral placas
K lateral placasK lateral placas
K lateral placas
 
Share
ShareShare
Share
 
2nd Presentation NAME 338
2nd Presentation NAME 3382nd Presentation NAME 338
2nd Presentation NAME 338
 
IP Transit : Simple Math - Simple Calculation
IP Transit : Simple Math - Simple CalculationIP Transit : Simple Math - Simple Calculation
IP Transit : Simple Math - Simple Calculation
 
NAME 338 , Presentation 1
NAME 338 , Presentation 1NAME 338 , Presentation 1
NAME 338 , Presentation 1
 
Ejercicio metodo simplificado de edificio de 4 niveles
Ejercicio metodo simplificado de edificio de 4 nivelesEjercicio metodo simplificado de edificio de 4 niveles
Ejercicio metodo simplificado de edificio de 4 niveles
 
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motor
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motorac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motor
ac motor speed reducer,AC vertical electro motor with 30:1 gearbox,ac motor
 

Dernier

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Dernier (20)

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Cassandra Summit 2010 - Operations & Troubleshooting Intro

  • 1. Cassandra troubleshooting: out of the shadows Benjamin Black, b@b3k.us
  • 3. The Allegory of the Cave
  • 4. Most people start troubleshooting problems interpreting shadows on the wall.
  • 6. Paths out of the cave.
  • 7. Combination of basic system tools & nodetool/JMX
  • 8. I’m using RP. My ring is very unbalanced.
  • 9. I’m using RP. My ring is very unbalanced. WTF?
  • 11. Address Status Load Range Ring 148873535527910577765226390751398592511 10.248.54.192 Up 5.59 GB 0 |<--| 10.248.254.15 Up 10.58 GB 42535295865117307932921825928971026431 | ^ 10.248.135.239Up 11.01 GB 85070591730234615865843651857942052863 v | 10.248.223.191Up 5.42 GB 106338239662793269832304564822427566079 | ^ 10.248.122.240Up 5.51 GB 127605887595351923798765477786913079295 v | 10.248.34.80 Up 5.45 GB 148873535527910577765226390751398592511 |-->|
  • 12. Address Status Load Range Ring 148873535527910577765226390751398592511 10.248.54.192 Up 5.59 GB 0 |<--| 10.248.254.15 Up 10.58 GB 42535295865117307932921825928971026431 | ^ 10.248.135.239Up 11.01 GB 85070591730234615865843651857942052863 v | 10.248.223.191Up 5.42 GB 106338239662793269832304564822427566079 | ^ 10.248.122.240Up 5.51 GB 127605887595351923798765477786913079295 v | 10.248.34.80 Up 5.45 GB 148873535527910577765226390751398592511 |-->|
  • 14. Automatic token algorithm: Assign a token that will give me half the range of the most loaded node.
  • 15. 32 16 16 8 8 16 8888 44888 444488
  • 16. Address Status Load Range Ring 148873535527910577765226390751398592511 10.248.54.192 Up 5.59 GB 0 |<--| 10.248.254.15 Up 10.58 GB 42535295865117307932921825928971026431 | ^ 10.248.135.239Up 11.01 GB 85070591730234615865843651857942052863 v | 10.248.223.191Up 5.42 GB 106338239662793269832304564822427566079 | ^ 10.248.122.240Up 5.51 GB 127605887595351923798765477786913079295 v | 10.248.34.80 Up 5.45 GB 148873535527910577765226390751398592511 |-->|
  • 18. 0-(2**127 - 1) def tokens(nodes) 0.upto(nodes - 1) do |n| p (n * (2**127 - 1) / nodes) end end
  • 20. YES: This means you need to change tokens on most of the nodes in your cluster whenever you add a node.
  • 21. Writes are fast. Reads keep getting slower.
  • 22. Writes are fast. Reads keep getting slower. WTF?
  • 25. Pool Name                    Active   Pending      Completed STREAM-STAGE                      0         0              0 RESPONSE-STAGE                    0         0         516280 ROW-READ-STAGE                    8      4096        1164326 LB-OPERATIONS                     0         0              0 MESSAGE-DESERIALIZER-POOL         1    682008        1818682 GMFD                              0         0           6467 LB-TARGET                         0         0              0 CONSISTENCY-MANAGER               0         0         661477 ROW-MUTATION-STAGE                0         0         998780 MESSAGE-STREAMING-POOL            0         0              0 LOAD-BALANCER-STAGE               0         0              0 FLUSH-SORTER-POOL                 0         0              0 MEMTABLE-POST-FLUSHER             0         0              4 FLUSH-WRITER-POOL                 0         0              4 AE-SERVICE-STAGE                  0         0              0 HINTED-HANDOFF-POOL               0         0              3
  • 26. Pool Name                    Active   Pending      Completed STREAM-STAGE                      0         0              0 RESPONSE-STAGE                    0         0         516280 ROW-READ-STAGE                    8      4096        1164326 LB-OPERATIONS                     0         0              0 MESSAGE-DESERIALIZER-POOL         1    682008        1818682 GMFD                              0         0           6467 LB-TARGET                         0         0              0 CONSISTENCY-MANAGER               0         0         661477 ROW-MUTATION-STAGE                0         0         998780 MESSAGE-STREAMING-POOL            0         0              0 LOAD-BALANCER-STAGE               0         0              0 FLUSH-SORTER-POOL                 0         0              0 MEMTABLE-POST-FLUSHER             0         0              4 FLUSH-WRITER-POOL                 0         0              4 AE-SERVICE-STAGE                  0         0              0 HINTED-HANDOFF-POOL               0         0              3
  • 27. Pool Name                    Active   Pending      Completed STREAM-STAGE                      0         0              0 RESPONSE-STAGE                    0         0         516280 ROW-READ-STAGE                    8      4096        1164326 LB-OPERATIONS                     0         0              0 MESSAGE-DESERIALIZER-POOL         1    682008        1818682 GMFD                              0         0           6467 LB-TARGET                         0         0              0 CONSISTENCY-MANAGER               0         0         661477 ROW-MUTATION-STAGE                0         0         998780 MESSAGE-STREAMING-POOL            0         0              0 LOAD-BALANCER-STAGE               0         0              0 FLUSH-SORTER-POOL                 0         0              0 MEMTABLE-POST-FLUSHER             0         0              4 FLUSH-WRITER-POOL                 0         0              4 AE-SERVICE-STAGE                  0         0              0 HINTED-HANDOFF-POOL               0         0              3
  • 28. Pool Name                    Active   Pending      Completed STREAM-STAGE                      0         0              0 RESPONSE-STAGE                    0         0         516280 ROW-READ-STAGE                    8      4096        1164326 LB-OPERATIONS                     0         0              0 MESSAGE-DESERIALIZER-POOL         1    682008        1818682 GMFD                              0         0           6467 LB-TARGET                         0         0              0 CONSISTENCY-MANAGER               0         0         661477 ROW-MUTATION-STAGE                0         0         998780 MESSAGE-STREAMING-POOL            0         0              0 LOAD-BALANCER-STAGE               0         0              0 FLUSH-SORTER-POOL                 0         0              0 MEMTABLE-POST-FLUSHER             0         0              4 FLUSH-WRITER-POOL                 0         0              4 AE-SERVICE-STAGE                  0         0              0 HINTED-HANDOFF-POOL               0         0              3
  • 29. YOU ARE OUT OF DISK BANDWIDTH
  • 31. Throttle reads at clients
  • 32. Adjust memtable settings (size/ops/time)
  • 33. Less frequent memtable flush Less frequent compaction Less disk bandwidth demand
  • 35. Add more spindles per node
  • 37. I inserted a bunch of data. Now my nodes are flapping.
  • 38. I inserted a bunch of data. Now my nodes are flapping. WTF?
  • 41. INFO 13:27:35,309 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
  • 43. mmap() in Cassandra consumes up to 2GB. Per segment.
  • 44. NOT tracked as JVM heap. *See: https://issues.apache.org/jira/browse/CASSANDRA-1214
  • 45. NOT tracked as JVM heap. JVM heap not locked in memory. *See: https://issues.apache.org/jira/browse/CASSANDRA-1214
  • 46. When your data set exceeds memory, this is likely.
  • 47. Swapping can delay gossip long enough to cause a node to be marked down.
  • 51. Most people start troubleshooting problems interpreting shadows on the wall.
  • 52. You can now see the path and the sunlight outside.
  • 54. What things have confused you?
  • 55. What problems have you solved?
  • 56. What tools have you used to solve them?

Notes de l'éditeur