SlideShare a Scribd company logo
1 of 14
Demystifying Flink
Memory Allocation &
Tuning
Flink Forward, Berlin 10/2019
Roshan Naik, Streaming Analytics Platform
Image from pixabay
Why Tune ?
• Important to know how much data can be stored in
the chosen state backend
• Which also dictates parallelism of stateful operators
• Under allocating leads to job crashing with OOM
• Over allocating (via more parallelism or container
size) is wasting $$$
• Tuning discussion here is centered around
• Streaming jobs
• Yarn containers
TaskMgr Container Memory Layout
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Yarn Container Size
Available to Flink
Cut Off + Available ≈ Container Size
For now, ignore the JVM metaspace size
“Cut Off” Space
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Yarn Container Size
Available to Flink
“Cut Off” Space:
• Safety Zone: If JVM tries to exceed container limit, it will be killed. By “cutting off” some
memory, Flink can operate in a smaller slightly space without fear of being externally
terminated.
• Parent and Peer processes: Utilized by scripts that launch the Flink JVM and any other peer
processes in container.
• Native allocations: Allocations from native (C/C++) libraries invoked by Flink (e.g. RocksDB).
On or Outside JVM Heap
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Container Size
Available to Flink
Cut Off Space: Outside JVM Heap – Native mem allocations
Netw Buff: Outside JVM – Java Direct Mem Allocation
TM Managed Mem: Configurable to be on JVM Heap or Outside JVM (via Direct Mem allocation).
But this mem is not used in streaming mode. (Also can’t be sized to 0 bytes)
Configs & Formulas
“Cut Off” Space
JAVA
metasp
ace
Flink
Network
Buff
TaskMgr
Managed
Memory
JVM Heap
Container Size
Available to Flink
containerized.heap-cutoff-ratio: % of container mem to set aside as Cut Off space.
taskmanager.network.memory.fraction: % of JVM Heap. Is divided into 32KB segments by default.
taskmanager.memory.fraction: % of (Available – Netw Buff) = TM managed memory size.
taskmanager.memory.off-heap: true/false: Choose if TM mgd mem goes on JVM Heap or outside.
taskmanager.memory.preallocate: true.false: Chose if TM mgd mem is allocated lazily or at startup.
Hints to Simplify Calculations
“Cut Off” Space
JAVA
Metasp
ace/pe
rmgen
Flink
Network
Buff
JVM Heap
TaskMgr
Managed
Memory
Container Size
Available to Flink
TM Managed Memory
- Place it on JVM heap
- Keep it real small (but larger than 0)
- Disable pre-allocation on it
- You may be able to get away by ignoring Java Metaspace… but good idea to check its size.
- Prior to Java 8 it was called PermGen space and defaults to < 100MB.
ignore
Hints to
Simplify
Calculations
• taskmanager.memory.offheap = false
• taskmanager.memory.preallocate = false
• taskmanager.memory.fraction = a small non zero
value
• Therefore, intuitively, available main mem:
• For RocksDB backend ≈ Cut Off
• For Mem/FS state backend ≈ JVM Heap = (ContainerSz –
Cut Off – NetwBuff)
Use Cases
•Typical
• Large JVM Heap: Memory/FS State Backend
• Large Cut Off: RocksDB Backend
•Rarer
• Balancing JVM Heap and Cut Off: Some operators relying
on RocksDB backend to store state and other operators
caching data temporarily in memory using Java Maps/Trees
(i.e. not in state backend).
Cheat Sheet – Memory/FS state backend
4 GB
container
8 GB
container
10 GB
container
16 GB
container
containerized.heap-cutoff-ratio 0.15 (= 600 MB) 0.15 (= 1.2 GB) 0.13 (= 1.3 GB) 0.09 (= 1.44 GB)
taskmanager.network.memory.fraction 0.045 (= 153 MB) 0.045 (= 306 MB) 0.045 (= 380 MB) 0.03 (= 437 MB)
taskmanager.memory.fraction 0.015 0.015 0.015 0.01
taskmanager.memory.off-heap false false false false
taskmanager.memory.preallocate false false false false
JVM Heap 3.25 GB 6.5 GB 8.31 GB 14.12 GB
Cheat Sheet – RocksDB state backend
10 GB
container
16 GB
container
32 GB
container
48 GB
container
containerized.heap-cutoff-ratio 0.76 (= 7.6 GB) 0.8 (= 12.8 GB) 0.86 (= 27.5 GB) 0.9 (= 43.2 GB)
taskmanager.network.memory.fraction 0.1 (= 0.24 GB) 0.15 (= 0.48 GB) 0.2 (= 0.9 GB) 0.2 (= 0.96 GB)
taskmanager.memory.fraction 0.05 0.04 0.04 0.04
taskmanager.memory.off-heap false false false false
taskmanager.memory.preallocate false false false false
JVM Heap 2.7 GB 2.88 GB 3.58 GB 3.84 GB
Avlbl To RocksDB * ~ 7.6 GB ~ 12.8 GB ~ 27.52 GB ~ 43.2 GB
* = CutOff. But If your JVM metaspace size is significant, reduce this further by the metaspace size.
Validating with TM Metrics
Need to Tweak it Yourself ?
• Try this calculator (clone it for yourself)
• https://docs.google.com/spreadsheets/d/1DMUnHXNdoK1BR9TpTTpqeZvbNq
vXGO7PlNmTojtaStU/edit?usp=sharing_eil&ts=5d9d40ae
• Calculator may be useful for batch jobs as well
• If this was useful. Let me know by liking this tweet:
https://twitter.com/naikrosh/status/1180034347191005184
Email: roshan@uber.com Twitter: @naikrosh, @UberEng
UBER Engineering Blog: eng.uber.com
Image from thebluediamondgallery

More Related Content

What's hot

What's hot (20)

Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Bootstrapping state in Apache Flink
Bootstrapping state in Apache FlinkBootstrapping state in Apache Flink
Bootstrapping state in Apache Flink
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
Apache Kafka’s Transactions in the Wild! Developing an exactly-once KafkaSink...
 
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
Time to-live: How to Perform Automatic State Cleanup in Apache Flink - Andrey...
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!Near real-time statistical modeling and anomaly detection using Flink!
Near real-time statistical modeling and anomaly detection using Flink!
 
Using Queryable State for Fun and Profit
Using Queryable State for Fun and ProfitUsing Queryable State for Fun and Profit
Using Queryable State for Fun and Profit
 

Similar to Demystifying flink memory allocation and tuning - Roshan Naik, Uber

Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limits
Davide Carnevali
 
Tomcatx troubleshooting-production
Tomcatx troubleshooting-productionTomcatx troubleshooting-production
Tomcatx troubleshooting-production
Vladimir Khokhryakov
 

Similar to Demystifying flink memory allocation and tuning - Roshan Naik, Uber (20)

#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
Apache Geode Offheap Storage
Apache Geode Offheap StorageApache Geode Offheap Storage
Apache Geode Offheap Storage
 
Java performance tuning
Java performance tuningJava performance tuning
Java performance tuning
 
Java on Linux for devs and ops
Java on Linux for devs and opsJava on Linux for devs and ops
Java on Linux for devs and ops
 
Understanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And ProfitUnderstanding Memory Management In Spark For Fun And Profit
Understanding Memory Management In Spark For Fun And Profit
 
Mastering java in containers - MadridJUG
Mastering java in containers - MadridJUGMastering java in containers - MadridJUG
Mastering java in containers - MadridJUG
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Jug Lugano - Scale over the limits
Jug Lugano - Scale over the limitsJug Lugano - Scale over the limits
Jug Lugano - Scale over the limits
 
Java garbage collection, jvm, visual vm
Java garbage collection, jvm, visual vmJava garbage collection, jvm, visual vm
Java garbage collection, jvm, visual vm
 
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
Javaday 2022 - Remèdes aux oomkill, warm-ups, et lenteurs pour des conteneur...
 
Jvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies applicationJvm & Garbage collection tuning for low latencies application
Jvm & Garbage collection tuning for low latencies application
 
3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To3 Flink Mistakes We Made So You Won't Have To
3 Flink Mistakes We Made So You Won't Have To
 
Heapoff memory wtf
Heapoff memory wtfHeapoff memory wtf
Heapoff memory wtf
 
Tomcatx troubleshooting-production
Tomcatx troubleshooting-productionTomcatx troubleshooting-production
Tomcatx troubleshooting-production
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
IMC Summit 2016 Breakout - Per Minoborg - Work with Multiple Hot Terabytes in...
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Webcenter application performance tuning guide
Webcenter application performance tuning guideWebcenter application performance tuning guide
Webcenter application performance tuning guide
 
My Old Friend Malloc
My Old Friend MallocMy Old Friend Malloc
My Old Friend Malloc
 

More from Flink Forward

More from Flink Forward (14)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
Large Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior DetectionLarge Scale Real Time Fraudulent Web Behavior Detection
Large Scale Real Time Fraudulent Web Behavior Detection
 
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Cap...
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Demystifying flink memory allocation and tuning - Roshan Naik, Uber

  • 1. Demystifying Flink Memory Allocation & Tuning Flink Forward, Berlin 10/2019 Roshan Naik, Streaming Analytics Platform Image from pixabay
  • 2. Why Tune ? • Important to know how much data can be stored in the chosen state backend • Which also dictates parallelism of stateful operators • Under allocating leads to job crashing with OOM • Over allocating (via more parallelism or container size) is wasting $$$ • Tuning discussion here is centered around • Streaming jobs • Yarn containers
  • 3. TaskMgr Container Memory Layout “Cut Off” Space JAVA metasp ace Flink Network Buff TaskMgr Managed Memory JVM Heap Yarn Container Size Available to Flink Cut Off + Available ≈ Container Size For now, ignore the JVM metaspace size
  • 4. “Cut Off” Space “Cut Off” Space JAVA metasp ace Flink Network Buff TaskMgr Managed Memory JVM Heap Yarn Container Size Available to Flink “Cut Off” Space: • Safety Zone: If JVM tries to exceed container limit, it will be killed. By “cutting off” some memory, Flink can operate in a smaller slightly space without fear of being externally terminated. • Parent and Peer processes: Utilized by scripts that launch the Flink JVM and any other peer processes in container. • Native allocations: Allocations from native (C/C++) libraries invoked by Flink (e.g. RocksDB).
  • 5. On or Outside JVM Heap “Cut Off” Space JAVA metasp ace Flink Network Buff TaskMgr Managed Memory JVM Heap Container Size Available to Flink Cut Off Space: Outside JVM Heap – Native mem allocations Netw Buff: Outside JVM – Java Direct Mem Allocation TM Managed Mem: Configurable to be on JVM Heap or Outside JVM (via Direct Mem allocation). But this mem is not used in streaming mode. (Also can’t be sized to 0 bytes)
  • 6. Configs & Formulas “Cut Off” Space JAVA metasp ace Flink Network Buff TaskMgr Managed Memory JVM Heap Container Size Available to Flink containerized.heap-cutoff-ratio: % of container mem to set aside as Cut Off space. taskmanager.network.memory.fraction: % of JVM Heap. Is divided into 32KB segments by default. taskmanager.memory.fraction: % of (Available – Netw Buff) = TM managed memory size. taskmanager.memory.off-heap: true/false: Choose if TM mgd mem goes on JVM Heap or outside. taskmanager.memory.preallocate: true.false: Chose if TM mgd mem is allocated lazily or at startup.
  • 7. Hints to Simplify Calculations “Cut Off” Space JAVA Metasp ace/pe rmgen Flink Network Buff JVM Heap TaskMgr Managed Memory Container Size Available to Flink TM Managed Memory - Place it on JVM heap - Keep it real small (but larger than 0) - Disable pre-allocation on it - You may be able to get away by ignoring Java Metaspace… but good idea to check its size. - Prior to Java 8 it was called PermGen space and defaults to < 100MB. ignore
  • 8. Hints to Simplify Calculations • taskmanager.memory.offheap = false • taskmanager.memory.preallocate = false • taskmanager.memory.fraction = a small non zero value • Therefore, intuitively, available main mem: • For RocksDB backend ≈ Cut Off • For Mem/FS state backend ≈ JVM Heap = (ContainerSz – Cut Off – NetwBuff)
  • 9. Use Cases •Typical • Large JVM Heap: Memory/FS State Backend • Large Cut Off: RocksDB Backend •Rarer • Balancing JVM Heap and Cut Off: Some operators relying on RocksDB backend to store state and other operators caching data temporarily in memory using Java Maps/Trees (i.e. not in state backend).
  • 10. Cheat Sheet – Memory/FS state backend 4 GB container 8 GB container 10 GB container 16 GB container containerized.heap-cutoff-ratio 0.15 (= 600 MB) 0.15 (= 1.2 GB) 0.13 (= 1.3 GB) 0.09 (= 1.44 GB) taskmanager.network.memory.fraction 0.045 (= 153 MB) 0.045 (= 306 MB) 0.045 (= 380 MB) 0.03 (= 437 MB) taskmanager.memory.fraction 0.015 0.015 0.015 0.01 taskmanager.memory.off-heap false false false false taskmanager.memory.preallocate false false false false JVM Heap 3.25 GB 6.5 GB 8.31 GB 14.12 GB
  • 11. Cheat Sheet – RocksDB state backend 10 GB container 16 GB container 32 GB container 48 GB container containerized.heap-cutoff-ratio 0.76 (= 7.6 GB) 0.8 (= 12.8 GB) 0.86 (= 27.5 GB) 0.9 (= 43.2 GB) taskmanager.network.memory.fraction 0.1 (= 0.24 GB) 0.15 (= 0.48 GB) 0.2 (= 0.9 GB) 0.2 (= 0.96 GB) taskmanager.memory.fraction 0.05 0.04 0.04 0.04 taskmanager.memory.off-heap false false false false taskmanager.memory.preallocate false false false false JVM Heap 2.7 GB 2.88 GB 3.58 GB 3.84 GB Avlbl To RocksDB * ~ 7.6 GB ~ 12.8 GB ~ 27.52 GB ~ 43.2 GB * = CutOff. But If your JVM metaspace size is significant, reduce this further by the metaspace size.
  • 13. Need to Tweak it Yourself ? • Try this calculator (clone it for yourself) • https://docs.google.com/spreadsheets/d/1DMUnHXNdoK1BR9TpTTpqeZvbNq vXGO7PlNmTojtaStU/edit?usp=sharing_eil&ts=5d9d40ae • Calculator may be useful for batch jobs as well • If this was useful. Let me know by liking this tweet: https://twitter.com/naikrosh/status/1180034347191005184
  • 14. Email: roshan@uber.com Twitter: @naikrosh, @UberEng UBER Engineering Blog: eng.uber.com Image from thebluediamondgallery