SlideShare une entreprise Scribd logo
1  sur  32
Resource Management:
Beancounters
Pavel Emelianov
xemul@openvz.org
Denis Lunev
den@openvz.org
Kirill Korotaev
dev@openvz.org
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Current state

Per-process accounting and limiting (rlimits)
− Manages individual processes
− Memory limits are mostly ignored by the kernel

Group-based management
− Absent

Global statistics
− Not suitable for group isolation
Operating system resources

Memory

CPU time

IO bandwidth

Networking bandwidth

Disk space
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Beancounters basics

A beancounter manages a group of tasks

Resource counters parameters
− held – the current consumption level
− limit – the maximal allowed level of consumption
− barrier – the "shortage warn" line – each resource
controller may take some precautions
− fails – the number of allocation rejects

Beancounter is assigned once during process
lifetime
Accounting details
Process
User space Kernel space
Beancounter
kernel object
Beancounters controlled resources

User memory
− Length of mappings
− RSS
− Locked pages

Dirty page cache

Kernel memory

Network buffers

Miscellaneous
resources
− Number of tasks
− Number of files
− Number of sockets
− Number of file locks
− Number of PTYs
− Number of signals
− Active dentry cache
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
User memory management

VMA lengths accounting
− Graceful rejects of VM region allocation
− Take precautions against overcommitment

RSS accounting
− Real memory usage
− OOM killer priorities

Dirty page cache accounting
− IO statistics and scheduling
VMA lengths accounting

VMAs classification
− unreclaimable:
private and anonymous
− reclaimable:
shared file mappings
Unused pages Used pages Unreclaimable VMAsReclaimable VMAs
“Lengths of mappings” resource
“RSS” resource

Pages classification
− unused:
parts of mapped regions
− used:
touched pages
Task address space
VMA lengths accounting pros'n'cons

Pros
− The way to track the
host commitment level
− Graceful rejects of
address space
growths

Cons
− Hard limiting of
address space growth
RSS accounting
First touch N Touches
Drawbacks

Additional pointer on the struct page

Extra locking during page faults
page page beancounter
beancounter
Shared pages accounting

Account the page to the first beancounter
− Non uniform statistics for similar beancounters

Account a whole page for each beancounter
− The values accounted are not related to the actual
memory usage

Account page's fractions the all beancounters
− The “middle” way used in the beancounters
Page fractions accounting
BC1
BC2
BC3BC4
1½
½¼
¼¼
¼
Algorithm benefits

O(1) algorithm of
adding and removing

The sum of RSS on all
beancounters is an
amount of all actually
used pages
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Dirty page cache accounting
First touch N Touches
Dirty
Unmap
Last unmap
Clean
IO beancounter
RSS accounting pros'n'cons

Pros
− Node memory
utilization statistics
− Asynchronous IO
scheduling
− Ground for fair page
reclamation

Cons
− Performance issues
− Memory consumption
by auxiliary data
structures
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Kernel memory management
Reason

Limited normal zone
− Mainly for 32-bit arches
Major problem

Object freeing context
− Reference counters
− RCU
Kernel MM data structures (pages)

Buddy page allocator
− Additional pointer on
the struct page

Vmalloc
− 0th page's pointer ...
page
struct vm_struct
Kernel MM data structures (slab)

Array of pointers after the slab
struct slab
kmem_bufctl_t[N]
... ...
N objects
...
beancounters
Kernel MM drawbacks

A slab can carry less objects

Slabs could become “offslab”
Slab name
# of objects Offslab-ness
Before After Before After
Size-32 113 101 – –
Size-64 59 56 – –
Size-128 30 29 – –
Size-256 15 15 – –
Size-512 8 8 + +
Size-1024 4 4 + +
Size-2048 2 2 + +
Size-4096 1 1 + +
Kernel MM pros'n'cons

Pros
− Tracking of kernel
memory usage

Cons
− No (all are already
optimized out)
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Network buffers accounting
Mainstream accounting shortcomings

slab overhead is not included
− up to 30% for usual Ethernet frames
− unpredictable difference for non-ethernet MTU
− no way to recalculate skb->truesize
Implementation basics

Separate accounting for
− send and receive buffers
− TCP and all the other types of traffic

Implementation is straightforward:
− account actual memory usage for objects with
undefined or infinite lifetime

select(2) compatibility

Buffer space guarantees
Packets context handling
beancounter
process
Network
socketSKB SKB
Agenda

Current state of resource management in the
Linux kernel

Beancounters overview

User memory management

I/O accounting

Kernel memory management

Network buffers accounting

Performance
Performance
Test name
No RSS Full
% %
Process creation 97% 91%
Execl Throughtput 99% 91%
Pipe Throughtput 100% 99%
Shell Scripts 96% 87%
File Read 99% 98%
File Write 101% 99%

RSS accounting – the bottleneck
Main future directions

Optimization
− Pre-charging

Kernel memory

VMAs lengths
− On-demand accounting

Active dentry cache

RSS

RSS limits
− Page reclamation

Better TCP window management
That's all folks

Questions?

Comments?
http://download.openvz.org/~xemul/

Contenu connexe

Tendances

Computer architecture
Computer architecture Computer architecture
Computer architecture
Ashish Kumar
 
Introduction to computer architecture and organization
Introduction to computer architecture and organizationIntroduction to computer architecture and organization
Introduction to computer architecture and organization
Muhammad Ishaq
 
Computer organuzaton & architecture
Computer organuzaton & architectureComputer organuzaton & architecture
Computer organuzaton & architecture
Subhankar Bisoyi
 
computer system architecture
computer system architecturecomputer system architecture
computer system architecture
dileesh E D
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
Anshul Sharma
 

Tendances (20)

Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Module5 part2
Module5 part2Module5 part2
Module5 part2
 
Von Neumann Architecture
Von Neumann Architecture   Von Neumann Architecture
Von Neumann Architecture
 
Harvard vs Von Neumann Architecture
Harvard vs Von Neumann ArchitectureHarvard vs Von Neumann Architecture
Harvard vs Von Neumann Architecture
 
Computer architecture
Computer architecture Computer architecture
Computer architecture
 
Introduction to computer architecture and organization
Introduction to computer architecture and organizationIntroduction to computer architecture and organization
Introduction to computer architecture and organization
 
Computer organuzaton & architecture
Computer organuzaton & architectureComputer organuzaton & architecture
Computer organuzaton & architecture
 
Computer system bus
Computer system busComputer system bus
Computer system bus
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
Pipeline
PipelinePipeline
Pipeline
 
Memory Management
Memory ManagementMemory Management
Memory Management
 
01 introduction
01 introduction01 introduction
01 introduction
 
05 internal memory
05 internal memory05 internal memory
05 internal memory
 
computer system architecture
computer system architecturecomputer system architecture
computer system architecture
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Stored program concept
Stored program conceptStored program concept
Stored program concept
 
EE5440 – Computer Architecture - Lecture 1
EE5440 – Computer Architecture - Lecture 1EE5440 – Computer Architecture - Lecture 1
EE5440 – Computer Architecture - Lecture 1
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 

En vedette (10)

Inventory accounting & management [compatibility mode]
Inventory accounting & management [compatibility mode]Inventory accounting & management [compatibility mode]
Inventory accounting & management [compatibility mode]
 
LookAhead inventory & accounting management
LookAhead inventory & accounting managementLookAhead inventory & accounting management
LookAhead inventory & accounting management
 
RELIANCE
RELIANCE RELIANCE
RELIANCE
 
Warehouse inventory mgmt slides v4-0
Warehouse inventory mgmt slides v4-0Warehouse inventory mgmt slides v4-0
Warehouse inventory mgmt slides v4-0
 
Inventory management
Inventory managementInventory management
Inventory management
 
Ppt of tally
Ppt of tallyPpt of tally
Ppt of tally
 
Inventory management
Inventory managementInventory management
Inventory management
 
27 tally presentation
27 tally presentation27 tally presentation
27 tally presentation
 
Inventory management
Inventory managementInventory management
Inventory management
 
Inventory Management - a ppt for PGDM/MBA
Inventory Management - a ppt for PGDM/MBAInventory Management - a ppt for PGDM/MBA
Inventory Management - a ppt for PGDM/MBA
 

Similaire à Resource management: beancounters

Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
Suresh Kumar
 
Recent advances in the Linux kernel resource management
Recent advances in the Linux kernel resource managementRecent advances in the Linux kernel resource management
Recent advances in the Linux kernel resource management
OpenVZ
 
Brief introduction to onTune(cio context)
Brief introduction to onTune(cio context)Brief introduction to onTune(cio context)
Brief introduction to onTune(cio context)
TeemStone Pty Ltd
 

Similaire à Resource management: beancounters (20)

Linux introduction
Linux introductionLinux introduction
Linux introduction
 
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
XPDS13: Performance Evaluation of Live Migration based on Xen ARM PVH - Jaeyo...
 
SQL 2005 Memory Module
SQL 2005 Memory ModuleSQL 2005 Memory Module
SQL 2005 Memory Module
 
Main Memory
Main MemoryMain Memory
Main Memory
 
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
 
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
 
Linux Performance Tunning Memory
Linux Performance Tunning MemoryLinux Performance Tunning Memory
Linux Performance Tunning Memory
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Incremental backups
Incremental backupsIncremental backups
Incremental backups
 
16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx16. PagingImplementIssused.pptx
16. PagingImplementIssused.pptx
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Unit 4
Unit  4Unit  4
Unit 4
 
Recent advances in the Linux kernel resource management
Recent advances in the Linux kernel resource managementRecent advances in the Linux kernel resource management
Recent advances in the Linux kernel resource management
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
Brief introduction to onTune(cio context)
Brief introduction to onTune(cio context)Brief introduction to onTune(cio context)
Brief introduction to onTune(cio context)
 
Performance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL DatabasePerformance Tuning And Optimization Microsoft SQL Database
Performance Tuning And Optimization Microsoft SQL Database
 
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
Lec10 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Memory part2
 
cs-intro-os.ppt
cs-intro-os.pptcs-intro-os.ppt
cs-intro-os.ppt
 
Memory management in Linux
Memory management in LinuxMemory management in Linux
Memory management in Linux
 

Plus de OpenVZ

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
OpenVZ
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 

Plus de OpenVZ (20)

PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015
 
Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey Bronnikov
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр Бурлука
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey Vagin
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel Emelyanov
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel Emelyanov
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Dernier (20)

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Resource management: beancounters

  • 1. Resource Management: Beancounters Pavel Emelianov xemul@openvz.org Denis Lunev den@openvz.org Kirill Korotaev dev@openvz.org
  • 2. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 3. Current state  Per-process accounting and limiting (rlimits) − Manages individual processes − Memory limits are mostly ignored by the kernel  Group-based management − Absent  Global statistics − Not suitable for group isolation
  • 4. Operating system resources  Memory  CPU time  IO bandwidth  Networking bandwidth  Disk space
  • 5. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 6. Beancounters basics  A beancounter manages a group of tasks  Resource counters parameters − held – the current consumption level − limit – the maximal allowed level of consumption − barrier – the "shortage warn" line – each resource controller may take some precautions − fails – the number of allocation rejects  Beancounter is assigned once during process lifetime
  • 7. Accounting details Process User space Kernel space Beancounter kernel object
  • 8. Beancounters controlled resources  User memory − Length of mappings − RSS − Locked pages  Dirty page cache  Kernel memory  Network buffers  Miscellaneous resources − Number of tasks − Number of files − Number of sockets − Number of file locks − Number of PTYs − Number of signals − Active dentry cache
  • 9. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 10. User memory management  VMA lengths accounting − Graceful rejects of VM region allocation − Take precautions against overcommitment  RSS accounting − Real memory usage − OOM killer priorities  Dirty page cache accounting − IO statistics and scheduling
  • 11. VMA lengths accounting  VMAs classification − unreclaimable: private and anonymous − reclaimable: shared file mappings Unused pages Used pages Unreclaimable VMAsReclaimable VMAs “Lengths of mappings” resource “RSS” resource  Pages classification − unused: parts of mapped regions − used: touched pages Task address space
  • 12. VMA lengths accounting pros'n'cons  Pros − The way to track the host commitment level − Graceful rejects of address space growths  Cons − Hard limiting of address space growth
  • 13. RSS accounting First touch N Touches Drawbacks  Additional pointer on the struct page  Extra locking during page faults page page beancounter beancounter
  • 14. Shared pages accounting  Account the page to the first beancounter − Non uniform statistics for similar beancounters  Account a whole page for each beancounter − The values accounted are not related to the actual memory usage  Account page's fractions the all beancounters − The “middle” way used in the beancounters
  • 15. Page fractions accounting BC1 BC2 BC3BC4 1½ ½¼ ¼¼ ¼ Algorithm benefits  O(1) algorithm of adding and removing  The sum of RSS on all beancounters is an amount of all actually used pages
  • 16. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 17. Dirty page cache accounting First touch N Touches Dirty Unmap Last unmap Clean IO beancounter
  • 18. RSS accounting pros'n'cons  Pros − Node memory utilization statistics − Asynchronous IO scheduling − Ground for fair page reclamation  Cons − Performance issues − Memory consumption by auxiliary data structures
  • 19. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 20. Kernel memory management Reason  Limited normal zone − Mainly for 32-bit arches Major problem  Object freeing context − Reference counters − RCU
  • 21. Kernel MM data structures (pages)  Buddy page allocator − Additional pointer on the struct page  Vmalloc − 0th page's pointer ... page struct vm_struct
  • 22. Kernel MM data structures (slab)  Array of pointers after the slab struct slab kmem_bufctl_t[N] ... ... N objects ... beancounters
  • 23. Kernel MM drawbacks  A slab can carry less objects  Slabs could become “offslab” Slab name # of objects Offslab-ness Before After Before After Size-32 113 101 – – Size-64 59 56 – – Size-128 30 29 – – Size-256 15 15 – – Size-512 8 8 + + Size-1024 4 4 + + Size-2048 2 2 + + Size-4096 1 1 + +
  • 24. Kernel MM pros'n'cons  Pros − Tracking of kernel memory usage  Cons − No (all are already optimized out)
  • 25. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 26. Network buffers accounting Mainstream accounting shortcomings  slab overhead is not included − up to 30% for usual Ethernet frames − unpredictable difference for non-ethernet MTU − no way to recalculate skb->truesize
  • 27. Implementation basics  Separate accounting for − send and receive buffers − TCP and all the other types of traffic  Implementation is straightforward: − account actual memory usage for objects with undefined or infinite lifetime  select(2) compatibility  Buffer space guarantees
  • 29. Agenda  Current state of resource management in the Linux kernel  Beancounters overview  User memory management  I/O accounting  Kernel memory management  Network buffers accounting  Performance
  • 30. Performance Test name No RSS Full % % Process creation 97% 91% Execl Throughtput 99% 91% Pipe Throughtput 100% 99% Shell Scripts 96% 87% File Read 99% 98% File Write 101% 99%  RSS accounting – the bottleneck
  • 31. Main future directions  Optimization − Pre-charging  Kernel memory  VMAs lengths − On-demand accounting  Active dentry cache  RSS  RSS limits − Page reclamation  Better TCP window management

Notes de l'éditeur

  1. Hi, my name is Pavel. My talk is about the resource management in the kernel and the way we do it in the OpenVZ.
  2. In the coming half-an-hour we'll talk about the current state of the resource management in the Linux kernel and outline some shortcomings of it. After this I will intruduce our resource management subsystem – the beancounters. I will tell about the main part of this subsystem – the memory management. This includes the user and the kernel memory management, input/output accounting and the network buffers accounting. At the very end, of cource, I will show the influence of the beancounters on the kernel and tell what we're planning to do about it.
  3. OK. The resource management can occur at three levels. First – the processes can be tracked individually, and the Linux kernel has some arms for this – the RLIMITS are intended to help with per-process resource management. The disadvantages of them are obvious – they work on individual processes only and protects the system from accidents. Let alone the fact that the memory limits (e.g. RLIM_CORE/_RSS) are mostly ignored by the kernel The next level is group-baseg accounting, which is completely missed in the kernel. The "user" notion is used on VFS layer only. So this level of accounting is required in the kernel rather badly. At the top goes the global management. That is the most prorabotanniy management in the kernel, but it is not suitable for group isolation at all.
  4. The operating system provides numerous resource for the executing processes. The main resources are memory, CPU time, IO and network bandwidth and the disk space. <click> This report concerns the most crucial resource only – that's the memory. As we'll see later "the memory" resource itself stands at the top of deep resources hierarchy.
  5. Now let's look at some basics of the beancounters. The best analogiya of what the beancounter is is the "nsproxy", which has recently apeared in the kernel. The beancounter denotes a process group. It is assigned to the task once during the task's lifetime and accounts for all the resource allocations made by the task. The beancounters account for many resources each of which is characterized with the "held" value – it is the current consumption level; the "limit" – the maximal allowed level of resource consumption; the "barrier" – the value that tells the controller that the resource is about to get run-out, and it is time to take some precautions. Finally, the "fails" value shows the number of resource allocation rejects.
  6. Let me tell some detail on the accounting. Let's start with a single process (a purple circle). Whenever he wants it can <click> attach itself to a beancounter. After this he, all his <click> children and even <click> grand-children live with this beancounter and cannot disown it. Almost each kernel object that is created with the resource allocation request <click> from the task is accounted with that task's beancounter. I.e. its "weight" or "size" is added to the "held" value of appropriate resource on the beancounter. The object may be of almost any kind – a file, a dentry, an iptable rule, virtual memory region – anything.
  7. The "beancounters" work with the following memory-related resources. The user memory, which includes such resource as "the lenghts of mappings" and the resident set size (the RSS). Locked page set is also accounted, but it is ommited in this talk as this kind of resource has noting interesting in design and implementation. Dirty page cache is another kind of resource and is very interesting as we'll see later. The most crucial memory type – the kernel memry – is also accounted. Finally, if we have time, we'll talk about the network buffers management. It has some spices. <click> However the full list of beancounters controlled resources will include the numbers of tasks, files, sockets, etc, active dentry cache and so on and so forth...
  8. Now let's see the details of user memory management. As I have told there are tree types of resources here – the lenghts of mappings, the RSS and the dirty page cache. The first resource accounting gives us the ability to reject the virtual memory exdending gracefully – with the error returned from mmap or brk system call. Besides, this allows to make some precautions against node overcommitment. RSS accounting gives us the real memory usage (we'll see a bit later how it works) and provides good group priorities for the out-of-memory killer. Dirty page cache accounting is mainly used in IO statistics gathering and asynchronous (output disk traffic) scheduling
  9. OK, here's the first resource – the lenghts of mappings. It works with the task address space. All the vm areas it may have are splitted <click> into two classes: the unreclaimable areas <click> – those that are not backed by any disk file and thus will go to swap on memory shortage; and <click> the reclaimable ones – those that are backed by a disk file and its reclamation is almost always succeeds. Next, we distinguish two <click> pages types in the areas. The "unused" page <click> is the "hole" in the vm area – the page place is reserved, but the physical page is not yet created. This page becomes "used" <click> in the page fault. Used page is the synonym for a physical page. Using these terms the "lenghts of mapping" resource is <click> the sum of used pages and unused ones in the unreclaimable areas. The RSS resource is <click> merely the number of used pages
  10. What are the pros and cons (pros – za, cons – protiv?) The pros are that we have the way to track the node overcommitment and may gracefully (that is the way that the application expects during its normal operation) reject the address space extending The cons are that once the group hit the limit it cannot move further unless it releases some of its mappings.
  11. Now the RSS resource. This works with pages, that are <click> touched and thus get by the process. The page ownership is established during page faults by attaching a page beancounter to the page. The page beancounter (a blue box) is a "tag" that is attached to a page and points to the beancounter (a green box) that owns the page. We do not point to the beancounter directly as after many touches from <click> different beancounters page have to point to many beancounters. To track this we attach a circular list of page beancounter each one pointing to appropriate beancounter. The page beancounter is also responsible for per-beancounter mapcount of the page. The drawbacks of such approach are <click> the following. First we have an extra pointer on the struct page. And the second is that we intruduce and extra locking in the page faults. What this results in – is at the end of my talk.
  12. The most interesting part of RSS accounting is how the shared page is accounted between the owning beancounters. Let's start with the single page. When the first beancounter touches it <click> it gets the whole page into its RSS resource. Then goes the second one <click>. We take one half of the page from the first beancounter and move this to the newbie. The third gay (sorry for my english – gUy) will <click> steal a quarter from the second one, without bothering the first. The forth <click> beancounter gets the quarter from the first one and doesn't mess with the rest owners. That is – we account the pages with the halves. Then we'll have one eigths and so on. This algorithm's benefits <click> are the follwong. The first is that we have a constant time algorithm of adding and removind the page owners – each touch we work with only one of the existing owners and do not bother the others. The second, is that when we sum-up the RSS values from all the beancounters we'll get the real physical memory consumption by the user space.
  13. Now let's see how we track the dirty page cache. Let's start with the known scheme of a page <click> owned by several beancounters. When one of them writres to the page <click> and thus makes it dirty, we attach an extra tag <click> to it – the IO beancounter, which points to the page ownership list of the page beancounters and holds the beancounter that made the page dirty. Even if the page is get unmapped from its dirtier <click>, and even if it gets completely unmapped from the user space <click>, it holds its IO beancounter until it is flushed to the disk <click> and becomes clean. Note that at the time of writing the page we do know the beancounter that is responsible for this write.
  14. The good points of such RSS tracking are obvious. We have the node memory utilisation statistics, we have the support for ashyncronous IO scheduling and we have the ground for page reclamation. The price we pay for it is the performance issues, that will come in details later, and the extra memory, that is required to store all the page and IO beancounters seen so far.
  15. Now the kernel memory management. The reason we have to controll it is that the normal zone of the kernel is limited. This zone is the only place where kernel can hold its objects and if this gets exhaused the system can stop. This problem stands mainly for 32-bit architectures, but even for 64-bit ones, eating all the page tables by a single process eats hundreds of megabytes from this zone. The major (and actually the only) problem of kernel objects tracking is theis freeing context. Reference counting and RCU technology makes kernel objects be freed almost in any context – that is the beancounter current task belongs to most often is not the same one, that brought the object alive.
  16. The objects that are allocated with vmalloc and buddy page allocator can be tracked easily. We just use the (already seen) extra pointer <click> on the struct page. Vmalloc objects are considered to be owned by the <click> zeroth page owner. This is simple and dosn't produce any noticeable neudobstv From the kernel API point of view we just have an additional GFP flag that tells the buddy allocator that we would like to account the page allocation with the beancounters
  17. Slab objects are different. Since one page may carry objects belonging to different beancounters we cannot treat the page's owner to be the object owner. To solve this we <click> place the appropriate number of pointers behind the struct slab and all its buf-ctls. Each object on the slab is owned by the beancounter referenced by the pointer of the same number.
  18. The described slab accounting may introduce some problems. First, slabs may become shorter as we steal some space for our pointers. Next, slabs may become "offslab". That's the theory. The reality <click> differs. Small objects from size-32 loose 10 percents of them, but the other caches look much better. Look, we loose only 5 percents of objects on size-64 cache and even less for others. And no slabs become offslab.
  19. The advantages are obvious – we have a total controll on the consumption of the most crucial memory resource – the kernel memory. All the disadvantages are already optimized out actually. We have no performance hit and negligible extra memory consumption.
  20. OK, let's start with the network accounting. First, let me tell why the existing accounting facilities are not that good. First thing is that the slab overhead is not taken into account. For example ethernet frames are allocated from size-2048 mainly and occupy up to 30 percents more space than they really need. The second thing is that the incoming traffic is not accounted at all, and finally the existing accounting is not strict – the limits set can be easily overused (den, how?).
  21. The network accounting basics are the follwing We distinguigh the incoming and outgoing traffic, this is natural. But we also make difference between the TCP and all the other traffic. Thus we have four kinds of resources. These four kinds have some fundamentials: We account the actual memory usage with all the overheads for all the objects that have undefined or infinite lifetime. For example TCP clones that are sent to the card are not accounted as they live for a very limited time. Then we have the select compatibility – if select says that the socket is writeable the write or send system call won't be rejected by the beancounter. And one more thing is that we provide some minimal guaranteed space for each kind of resource to allow a slow progress for each socket
  22. However there are some species for different resources. For example TCP outgoing paths wait for the beancounter resource availability in case limit is hit. For simple protocols like UDP we drop the incoming packets when the limit is hit. This is OK as the protocol itself do not provide any packet delivery guarantees. Netlink traffic is always accounted on the user-side socket despite traffic source. This is done so as kernel produces traffic mainly on response from user.
  23. Now we're done with the accounting. Let's now talk about the performance. The major bottleneck is the RSS accounting with its extra locking. As you can see from the table the beancounters withot this part produce very small overhead (if any) on many unixbench tests. The RSS mainly hits the fork() and exec() (and thus – the shell test) operation speed. Howver we have some ideas of how to improve this.
  24. The beancounters are continuously evolving and even now we have something to work on. The major question is the performance issues. The main technics we're exploiting are pre-charging and on-demand accounting. Pre charging means that each task reserves some amount of resource from the beancounter during its creation and exhaust it for object allocations in the future. We have this working for kernel memory, files and network buffers and plan to account the vma lenghts with it. On-demand accounting is a bit more tricky. The main idea is that we need to account the precice values only when we're near the limit. When the beancounter comsumes little of resource we can have just estimations of the consumption level. We have this working for active dentry cache and are willing to implement this for RSS accounting. Another importaint question is implementing hard RSS limits with pages reclamation. We have some proof-of-concept patches sent to lkml. Soon this will apear in oficial openvz kernels. And the last interesting question is TCP window management based on beancounter resource availability. This will allow for better manegement of TCP traffic.
  25. Well, that's all. Thank you for your attention. Now I'm ready to answer your questions.