Slab allocation is a memory management mechanism intended for the efficient memory allocation of kernel objects which displays the desirable property of eliminating fragmentation caused by allocations and deallocations.
3. Kernel Memory Allocated in Early Linux Kernel
In early kernel version, the kernel creates 13 geometrically
distributed lists of free memory areas whose sizes range from
25 to 217 bytes, named as general kernel memory system.
Marvell
3 / 20
4. Example of kmalloc
¤
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
s t r u c t foo {
s t r u c t mutex f o o l l o c k ;
s t r u c t bar ∗ f o o l b a r l i s t ;
int foo refcnt ;
};
f o o = k m e m a l l o c ( s i z e o f ( s t r u c t f o o ) , KM SLEEP ) ;
m u t e x i n i t (& f o o− o o l o c k ) ;
>f
f o o− o o b a r l i s t = NULL ;
>f
f o o− o o r e f c n t = 0 ;
>f
/∗ u s e f o o ∗/
ASSERT( f o o− o o b a r l i s t == NULL ) ;
>f
ASSERT( f o o− o o r e f c n t == 0 ) ;
>f
m u t e x d e s t r o y (& f o o− o o l o c k ) ;
>f
kmem free ( foo ) ;
The cost of constructing an object can be significantly higher than
cost of allocating memory for it.1
construct+destruct
23.6µs
memory allocation
9.4µs
Wasted space will be at most 1/2, internal fragmentation is still bad.
1
Marvell
Jeff Bonwick,The Slab Allocator:An Object-Caching Kernel Memory Allocator
4 / 20
5. Some Improvement
¤
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Marvell
foo cache = kmem cache create ( ” foo cache ” , s i z e o f ( s t r u c t foo ) , 0 ,
foo constructor , foo destructor ) ;
f o o = k m e m c a c h e a l l o c ( f o o c a c h e , KM SLEEP ) ;
/∗ u s e f o o ; ∗/
kmem cache free ( foo cache , foo ) ;
/∗ − − − − − − − − − − − − − − − − − − −
− − − − − − − − − − − − − − − − − − − −∗/
v o i d f o o c o n s t r u c t o r ( v o i d ∗buf , i n t
{
s t r u c t foo ∗foo = buf ;
m u t e x i n i t (& f o o− o o l o c k ) ;
>f
f o o− o o r e f c n t = 0 ;
>f
f o o− o o b a r l i s t = NULL ;
>f
}
v o i d f o o d e s t r u c t o r ( v o i d ∗buf , i n t
{
s t r u c t foo ∗foo = buf ;
size )
size )
ASSERT( f o o− o o b a r l i s t == NULL) ;
>f
ASSERT( f o o− o o r e f c n t == 0 ) ;
>f
m u t e x d e s t r o y (& f o o− o o l o c k ) ;
>f
}
5 / 20
6. What is SLAB
What is it?
It is a pool based on Buddy subsystem for Kernel. It consists of
many sub-pool, which should be created by hand.
SLAB may like this:
Marvell
6 / 20
7. Relation of SLAB & Buddy
SLAB API:
1
2
3
4
5
6
7
8
Marvell
/∗ C r e a t e a c a c h e ∗/
s t r u c t kmem cache ∗ k m e m c a c h e c r e a t e ( c o n s t c h a r ∗name , s i z e t s i z e ,
a l i g n , u n s i g n e d l o n g f l a g s , v o i d (∗ c t o r ) ( v o i d ∗) ) ;
/∗ A l l o c a t e an o b j e c t ∗/
v o i d ∗ k m e m c a c h e a l l o c ( s t r u c t kmem cache ∗cache p , g f p t f l a g s )
/∗ D e a l l o c a t e an o b j e c t ∗/
v o i d k m e m c a c h e f r e e ( s t r u c t kmem cache ∗cache p , v o i d ∗ o b j p )
/∗ d e l e t e a c a c h e ∗/
v o i d k m e m c a c h e d e s t r o y ( s t r u c t kmem cache ∗ c a c h e p )
¤
size t
7 / 20
9. Basic Things for Implementation
From implementation aspect, there are 2 levels in this system.
In the first level, how many objects in one slab, and how many
pages one slab will take[struct kmem cache];
In the second level, how to organize objects in slab[struct
slab];
Marvell
9 / 20
10. Numbers for Slab and Pages
The algorithm of calculating how many pages for one slab
For order= 0 to MAX ORDER:
1
2
3
More than one Object.
Large slab is bad for Subsystem.
IF RAM > 32MB, number is 2; ELSE number is 1;
Internal fragmentation Limition
Left over < pages
8
After the pages of one slab is done, the objects number is also got
easily.
Marvell
10 / 20
11. How to Organize objects of one slab
Use array to emulate linked list.
Marvell
11 / 20
12. Optimization(1)–SLAB Color
For servel years ago, L1 cache is small. Cache confliction is
common in slab.
To make the best use of the processor L1 cache, Color is drawn.
Marvell
12 / 20
13. Optimization(2)
OFF SLAB & IN SLAB [for reducing internal fragment]
If object size > 1 PAGE SIZE, OFF SLAB used.
8
Local array objects limit count & bachount?
Object Size
[128K , ]
[4K , 128K ]
[1K , 4K ]
[ 1 K , 1K ]
4
[0, 1 K ]
4
Count Limit
1
8
24
54
120
batchount
1
4
12
27
60
3 lists’ limit count = nodes*limit + objects number of slab
Marvell
13 / 20
16. Allocation
if (an object in the local array)
Take one;
else if (slab in the partial cache list){
Transfer bachcount objects to local array;
Take one;
If slab is full, mount the slab wit full cache list}
else if (slab in the free cache list)
Transfer batchount objects to local array;
Take one;
Mount the slab with partial cache list;
else {
Allocate several pages from buddy for a slab;
Construct the object;
Transfer batchount objects to local array;
Take one;
Mount the slab with partial cache list;
}
Marvell
16 / 20
17. Free
Algorithm for free an object:
Free an object
if(local array number < limit)
Append an object to local array.
else
Release batchount objects to slab {
if (slab is whole free)
if l3 free objects > limit
Release entire slab to
buddy system
else
Mount the slab to free list
else
Mount the slab to partial list
}
Marvell
17 / 20
19. SLUB
SLUB promises better performance and scalability by dropping
most of the queues and related overhead and simplifying the slab
structure in general, while retaining the current slab allocator
interface. Verified 5 − 10% performance increase2 .
SLUB feature
Drop the slab management out. The management function
is transfered to struct page;
Drop the Full list & Free list out;
On systems with 1k nodes/processors, Several gigabytes just
tied up for storing references to objects for those queues.
Local slab instead of local object array.
When create, if existing slab object size is samilar to creating,
use the existing.
a 50% reduction is claimed
2
Marvell
Corbet,http://lwn.net/Articles/229984/
19 / 20
20. SLOB
The SLOB allocator intended for tiny systems, especially for
system without MMU.
SLOB feature
0˜256Bytes took one slab list, 256˜1024 took another list,
1024˜4096 took the 3rd list;
If request size > PAGE SIZE, alloc get order (size) pages from
Buddy system directly;
One slob is one page. Scan free object is to be first-fit
algorithm;
Object’s relation is connected by the front 4 Bytes of every
Object;
Doesn’t set local cache for per cpu;
Marvell
20 / 20