SlideShare a Scribd company logo
1 of 38
Download to read offline
#>LLPD
Ivan Matylitski
https://github.com/buffovich
http://www.epam.com/
http://www.epam.com/solutions/embedded.html
#>LLPDhttp://www.epam.com/
RAM usage from userspaceRAM usage from userspace
perspectiveperspective
#>LLPDhttp://www.epam.com/
RAM usage from userspace perspectiveRAM usage from userspace perspective
Modern physical memory architecture
➢ Basic ideas about how to use it effectively
Memory allocation in code
➢ Classic memory allocation models
➢ Static allocation (data segments)
➢ Allocation in stack (userspace stack characteristics)
➢ Dynamic allocation (jemalloc, tcmalloc, libc6 malloc)
➢ Object caching (SLAB)
➢ Original idea of SLAB
➢ Existing slab implementations
#>LLPDhttp://www.epam.com/
RAM-CPU relationships: historical retrospectiveRAM-CPU relationships: historical retrospective
Long time ago...
Processors and memory chips works with the same
frequency (actually, with slightly different speeds)
CPU RAM
#>LLPDhttp://www.epam.com/
RAM-CPU relationships: historical retrospectiveRAM-CPU relationships: historical retrospective
Long time ago...
Were two types of memory:
● Registers based on SRAM
● External memory based on DRAM
● There was ROM but... You know.
CPU DRAMREGEN
A B
#>LLPDhttp://www.epam.com/
From now... what about RAM technologies?From now... what about RAM technologies?
SRAM bit cell is based on cross-linked logical gates
with positive feedback.
YYES?
● Fast
● Shares the same manufacturing
technology with CPU
YNOT?
● High power consumption
● Large size
#>LLPDhttp://www.epam.com/
From now... what about RAM technologies?From now... what about RAM technologies?
DRAM bit cell is based on capacitor paired with
logic valve.
YYES?
● Lower power consumption
● Tiny cell size
YNOT?
● Needs recharging cycles
● Needs extra machinery serves
the needs of regeneration
● Has higher latency and lower
speed because of these cycles
#>LLPDhttp://www.epam.com/
From now... what about RAM technologies?From now... what about RAM technologies?
DRAM is more complicated beast than you might
think before...
That is how it works through
the time scale
#>LLPDhttp://www.epam.com/
From now... what about RAM technologies?From now... what about RAM technologies?
- OMG! It means that RAM has the same property as
spindel drive?
- Well... Each time you want to select new address you
need to set up RAM controller to the right position.
Random memory isn't so random...
#>LLPDhttp://www.epam.com/
New beast: spatial locality.New beast: spatial locality.
- You know... The fastest way to read a data from the external RAM
is to read it sequentially.
- I guess, it means that I should place data used together as close
as it possible to each other?
- Definitely.
Sincerely yours, Captain
#>LLPDhttp://www.epam.com/
New beast: spatial locality.New beast: spatial locality.
BTW, memory defragmentation isn't so weird idea. How would it be
nice to have self-defragmenting rope container, for example.
Constant-time concatenation and spatial-locality-awareness in the
same time...
rope<T, Alloc>
#>LLPDhttp://www.epam.com/
What about contemporary technologies?What about contemporary technologies?
So, guys, bad news for you... External RAM isn't rocks anymore...
“From 1986 to 2000, CPU speed improved at an annual rate of 55% while
memory speed only improved at 10%.” Wikipedia says. Hundreds of CPU
cycles can be wasted in the name of memory access. Moreover, recall
spatial and latency issues and you will come up with idea...
?
#>LLPDhttp://www.epam.com/
What about contemporary technologies?What about contemporary technologies?
We need more registers! They are based on faster SRAM!
- Well modern processors got ones with really ubiquitous names.
But not so much as you might think. And you should say now...
-Why?
?
#>LLPDhttp://www.epam.com/
What about contemporary technologies?What about contemporary technologies?
Awareness of code about the huge register file will
complicate the code and its pipeline processing inside
the CPU chip. Imagine 4KB of registers where each one
has its own name/index.
The actual answer for the previous question is
transparent CPU...
Cache
#>LLPDhttp://www.epam.com/
So... what about cache?So... what about cache?
It can be considered as transparent register file based on fast SRAM cells.
Actually, there are several levels of cache with different sizes (the more we
move away from the CPU core the more latency and register file size become).
It looks like this: Or even like this:
#>LLPDhttp://www.epam.com/
Cache is transparent?Cache is transparent?
Program code doesn't control cache directly. Cache
has its own algorithm coded in logical valves.
Here are the types:
● Fully-associative (extreme case)
● Direct mapped (extreme case)
● N-way associative (the one which if
frequently used in practice)
#>LLPDhttp://www.epam.com/
Fully-associative?Fully-associative?
Each entry from underlying external memory may be
written to each cache entry. Example: (I/D)TLB.
ADDRESS
TAG
TAG
TAG
TAG
DATA
DATA
DATA
DATA
#>LLPDhttp://www.epam.com/
Direct mapped?Direct mapped?
Each entry from underlying external memory may be
written to the only one corresponding cache entry.
Address is divided into three parts now.
TAG
TAG
TAG
TAG
DATA
DATA
DATA
DATA
ADDRESS
TAG INDEX OFFSET
#>LLPDhttp://www.epam.com/
N-associative?N-associative?
Each entry from underlying external memory may be written to one of
entries which is in the group of N entries. Combination of
fully-associative and direct mapped cache. Address is divided into three
parts as well. Examples: modern L1 and L2 data/instruction caches.
TAG
TAG
TAG
TAG
DATA
DATA
DATA
DATA
ADDRESS
TAG INDEX OFFSET
TAG DATA
TAG DATA
2-way associative
#>LLPDhttp://www.epam.com/
Basically, what does cache provide?Basically, what does cache provide?
It helps code if code maintains temporal locality. Its
new beast which is about re-using data which was used
earlier. Example: limit as exit condition in loop.
Moreover, it helps with spatial locality because cache
entry has particular size and entry can be filled in
background in read-ahead manner. In the meantime,
code in pipeline can proceed with its execution.
#>LLPDhttp://www.epam.com/
What is boiled down outcome about DRAM andWhat is boiled down outcome about DRAM and
cache?cache?
Read/write data sequentially. Re-use data which has been
just used. Help cache with read-ahead. Keep data pieces
which are used together as close as possible to each
other. Use in-lining for functions which performance is
crucial.
Homework: think about matrix multiplication
optimization.
#>LLPDhttp://www.epam.com/
May you tell us about multi-core and multi-processorMay you tell us about multi-core and multi-processor
configurations?configurations?
Yep.
#>LLPDhttp://www.epam.com/
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
The simplest one:
static int I; //first
static buf[] = “123”; //second
static struct { int a; int b } str; //third
First and third will come from .bss. The second will
come from .data.
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
More “dynamic” in nature (with stack):
void f() {
int I; //first
buf[] = “123”; //second
struct { int a; int b } str; //third
}
First and third will be uninitialized and nothing will be
done for their initialization but extending stack frame
(adding sizeof( … ) to SP). Second will be derived with
frame extending with MOV instruction. Be careful with
even userspace stack. For main thread it's just ~4MB
long by default.
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
Even more “dynamic”:
void f() {
int *I = alloca( sizeof( int ) * 10 );
}
Expands memory frame further. Still think about
limited amount of available stack memory.
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
But what if we can't predict amount of data and this
amount can be fairly big? We need heap allocator:
void f( size_t s ) {
int *I = malloc( sizeof( int ) * s );
}
Get block from the heap.
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
- It's so obvious! Why do you even tell us about this stuff?
- Really? Then, please, answer, why several implementations of
dynamic allocator exists? What about allocation algorithms
(first-fit, best-fit, buddy-list).
- Oh. Which ones?
The most popular allocators are:
● tcalloc (the one for scalable high-load systems)
● je_malloc (scalable allocator as well as tcalloc; default
allocator in freebsd, netbsd)
● ptmalloc (scalable version of dlmalloc for SMP systems; libc6)
● dlmalloc (good at embedded/mobile systems or applications)
#>LLPDhttp://www.epam.com/
Memory management from userspace.Memory management from userspace.
Algorithms of oblivious memory allocators seems
heuristic. They don't have any idea about numerical
characteristics of particular usage case. How can we
hint allocator about our intents and preferences?
Moreover, how can we optimize and decrease
frequency of object initialization/destruction? We
don't need it each time we want to use object for
short time.
The answer was delivered by Jeff Bonwick and
introduced in the Solaris 2.4 kernel.
#>LLPDhttp://www.epam.com/
And...?And...?
Jeff Bonwick introduced SLAB allocator. Slab is
simple object cache where you can put to and get
from already allocated and appropriately initialized
structures.
The fact is that you know exact size of crucial data
structures in your code. If you need a lot of such
structures and need to allocate it dynamically you
will definitely extract benefit from usage of SLAB.
What are users SLAB in these latter days?...
It's Linux kernel!
#>LLPDhttp://www.epam.com/
Linux kernel...?Linux kernel...?
Here is main course:
struct kmem_cache *
kmem_cache_create(const char *name, size_t
size, size_t align, unsigned long flags,
void (*ctor)(void *));
void * kmem_cache_alloc(struct kmem_cache
*cachep, gfp_t flags);
kmem_cache_free(task_struct_cachep, tsk);
kmem_cache_destroy(task_struct_cachep);
#>LLPDhttp://www.epam.com/
How kernel stuff relates to userspace?How kernel stuff relates to userspace?
Such approach can be useful in userspace as well.
Especially, it's true for object-oriented languages.
It was considered and there are several ready-for-use
implementations:
● Apache Runtime Library has slab-like machinery
● Glib has g_slice
● C++ Boost has pool
And...
I've written another one and named it as libmempool.
https://github.com/buffovich/libmempool
#>LLPDhttp://www.epam.com/
What? Another one? Who do you think you are? For whatWhat? Another one? Who do you think you are? For what
reason?reason?
Just few points:
● Fully statically configurable for big/small memory
footprint; for being thread-safe or not (if it's not then a
few bytes-per-cache will be saved and there will be no
lock/unlock overhead); for being colored or not (a few
bytes for slab can be saved)
● Project is not attended to particular
library/framework/environment; you can use it
absolutely separately;
● Thread-safety
● Optional reference counting for block
● Statically configured allocation backends
(tcalloc/ptalloc/dlalloc/je_malloc)
● Constructors/destructors/”recyclers” support
#>LLPDhttp://www.epam.com/
Sounds tasty... Something else?Sounds tasty... Something else?
Yep:
● Ready-for-run makefile
● Doxygen documentation
#>LLPDhttp://www.epam.com/
What about maturity and unit-tests?What about maturity and unit-tests?
I knew you ask me that.
● Unfortunately, there are no unit-tests yet; it will be
implemented, of course; test plan is ready.
● Moreover, you may notice the absence of cache
coloring; it will be implemented also.
● In extreme cases (a huge amount of structures are
going to be allocated/deallocated), allocation backend
might commit noticeable percentage to the
performance graphs. Going to implement “raw”
backend based on POSIX mmap.
#>LLPDhttp://www.epam.com/
Thanks to ...Thanks to ...
Denis Pynkin (EPAM Systems)
Artem Sheremet (EPAM Systems)
#>LLPDhttp://www.epam.com/
Choose your destiny...Choose your destiny...
#>LLPDhttp://www.epam.com/
...Joking :-)...Joking :-)
Ask your questions, please!Ask your questions, please!

More Related Content

More from Minsk Linux User Group

Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDАндрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDMinsk Linux User Group
 
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
Vitaly  ̈_Vi ̈ Shukela - My FOSS projectsVitaly  ̈_Vi ̈ Shukela - My FOSS projects
Vitaly ̈_Vi ̈ Shukela - My FOSS projectsMinsk Linux User Group
 
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниAlexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниMinsk Linux User Group
 
Vikentsi Lapa — How does software testing become software development?
Vikentsi Lapa — How does software testing  become software development?Vikentsi Lapa — How does software testing  become software development?
Vikentsi Lapa — How does software testing become software development?Minsk Linux User Group
 
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеМихаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеMinsk Linux User Group
 
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNМаксим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNMinsk Linux User Group
 
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...Minsk Linux User Group
 
MajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMinsk Linux User Group
 
Максим Салов - Отладочный монитор
Максим Салов - Отладочный мониторМаксим Салов - Отладочный монитор
Максим Салов - Отладочный мониторMinsk Linux User Group
 
Максим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewМаксим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewMinsk Linux User Group
 
Константин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsКонстантин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsMinsk Linux User Group
 
Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Minsk Linux User Group
 
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Minsk Linux User Group
 
Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Minsk Linux User Group
 
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...Minsk Linux User Group
 
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...Minsk Linux User Group
 
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной сб...
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной   сб...Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной   сб...
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной сб...Minsk Linux User Group
 
Ирина Шубина - Обзор базовых лицензий свободного ПО
Ирина Шубина - Обзор базовых лицензий свободного ПОИрина Шубина - Обзор базовых лицензий свободного ПО
Ирина Шубина - Обзор базовых лицензий свободного ПОMinsk Linux User Group
 

More from Minsk Linux User Group (20)

Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSDАндрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
Андрэй Захарэвіч - Як мы ставілі KDE пад FreeBSD
 
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
Vitaly  ̈_Vi ̈ Shukela - My FOSS projectsVitaly  ̈_Vi ̈ Shukela - My FOSS projects
Vitaly ̈_Vi ̈ Shukela - My FOSS projects
 
Vitaly ̈_Vi ̈ Shukela - Dive
Vitaly  ̈_Vi ̈ Shukela - DiveVitaly  ̈_Vi ̈ Shukela - Dive
Vitaly ̈_Vi ̈ Shukela - Dive
 
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизниAlexander Lomov - Cloud Foundry и BOSH: истории из жизни
Alexander Lomov - Cloud Foundry и BOSH: истории из жизни
 
Vikentsi Lapa — How does software testing become software development?
Vikentsi Lapa — How does software testing  become software development?Vikentsi Lapa — How does software testing  become software development?
Vikentsi Lapa — How does software testing become software development?
 
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? ПродолжениеМихаил Волчек — Свободные лицензии. быть или не быть? Продолжение
Михаил Волчек — Свободные лицензии. быть или не быть? Продолжение
 
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPNМаксим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
Максим Мельников — IPv6 at Home: NAT64, DNS64, OpenVPN
 
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...Слава Машканов — “Wubuntu”: Построение гетерогенной среды  Windows+Linux на н...
Слава Машканов — “Wubuntu”: Построение гетерогенной среды Windows+Linux на н...
 
MajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного ДомаMajorDoMo: Открытая платформа Умного Дома
MajorDoMo: Открытая платформа Умного Дома
 
Максим Салов - Отладочный монитор
Максим Салов - Отладочный мониторМаксим Салов - Отладочный монитор
Максим Салов - Отладочный монитор
 
Максим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overviewМаксим Мельников - FOSDEM 2014 overview
Максим Мельников - FOSDEM 2014 overview
 
Константин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о JenkinsКонстантин Шевцов - Пара слов о Jenkins
Константин Шевцов - Пара слов о Jenkins
 
Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»Ермакович Света - Операция «Пингвин»
Ермакович Света - Операция «Пингвин»
 
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
Михаил Волчек - Смогут ли беларусы вкусить плоды Творческих Общин? Creative C...
 
Vikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testingVikentsi Lapa - Tools for testing
Vikentsi Lapa - Tools for testing
 
Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?Алексей Туля - А нужен ли вам erlang?
Алексей Туля - А нужен ли вам erlang?
 
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...
Виктор Сергейчик - Как пользоваться PGP безопасно и правильно. Вводная к Keys...
 
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...
Дмитрий Костюк - Очаровательные серые кнопки: обзор эволюции виджет-тулкитовH...
 
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной сб...
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной   сб...Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной   сб...
Дмитрий Перлов openSUSE Build Server: tips & tricks кросс-дистрибутивной сб...
 
Ирина Шубина - Обзор базовых лицензий свободного ПО
Ирина Шубина - Обзор базовых лицензий свободного ПОИрина Шубина - Обзор базовых лицензий свободного ПО
Ирина Шубина - Обзор базовых лицензий свободного ПО
 

Recently uploaded

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 

Recently uploaded (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 

RAM usage from userspace perspective

  • 2. #>LLPDhttp://www.epam.com/ RAM usage from userspaceRAM usage from userspace perspectiveperspective
  • 3. #>LLPDhttp://www.epam.com/ RAM usage from userspace perspectiveRAM usage from userspace perspective Modern physical memory architecture ➢ Basic ideas about how to use it effectively Memory allocation in code ➢ Classic memory allocation models ➢ Static allocation (data segments) ➢ Allocation in stack (userspace stack characteristics) ➢ Dynamic allocation (jemalloc, tcmalloc, libc6 malloc) ➢ Object caching (SLAB) ➢ Original idea of SLAB ➢ Existing slab implementations
  • 4. #>LLPDhttp://www.epam.com/ RAM-CPU relationships: historical retrospectiveRAM-CPU relationships: historical retrospective Long time ago... Processors and memory chips works with the same frequency (actually, with slightly different speeds) CPU RAM
  • 5. #>LLPDhttp://www.epam.com/ RAM-CPU relationships: historical retrospectiveRAM-CPU relationships: historical retrospective Long time ago... Were two types of memory: ● Registers based on SRAM ● External memory based on DRAM ● There was ROM but... You know. CPU DRAMREGEN A B
  • 6. #>LLPDhttp://www.epam.com/ From now... what about RAM technologies?From now... what about RAM technologies? SRAM bit cell is based on cross-linked logical gates with positive feedback. YYES? ● Fast ● Shares the same manufacturing technology with CPU YNOT? ● High power consumption ● Large size
  • 7. #>LLPDhttp://www.epam.com/ From now... what about RAM technologies?From now... what about RAM technologies? DRAM bit cell is based on capacitor paired with logic valve. YYES? ● Lower power consumption ● Tiny cell size YNOT? ● Needs recharging cycles ● Needs extra machinery serves the needs of regeneration ● Has higher latency and lower speed because of these cycles
  • 8. #>LLPDhttp://www.epam.com/ From now... what about RAM technologies?From now... what about RAM technologies? DRAM is more complicated beast than you might think before... That is how it works through the time scale
  • 9. #>LLPDhttp://www.epam.com/ From now... what about RAM technologies?From now... what about RAM technologies? - OMG! It means that RAM has the same property as spindel drive? - Well... Each time you want to select new address you need to set up RAM controller to the right position. Random memory isn't so random...
  • 10. #>LLPDhttp://www.epam.com/ New beast: spatial locality.New beast: spatial locality. - You know... The fastest way to read a data from the external RAM is to read it sequentially. - I guess, it means that I should place data used together as close as it possible to each other? - Definitely. Sincerely yours, Captain
  • 11. #>LLPDhttp://www.epam.com/ New beast: spatial locality.New beast: spatial locality. BTW, memory defragmentation isn't so weird idea. How would it be nice to have self-defragmenting rope container, for example. Constant-time concatenation and spatial-locality-awareness in the same time... rope<T, Alloc>
  • 12. #>LLPDhttp://www.epam.com/ What about contemporary technologies?What about contemporary technologies? So, guys, bad news for you... External RAM isn't rocks anymore... “From 1986 to 2000, CPU speed improved at an annual rate of 55% while memory speed only improved at 10%.” Wikipedia says. Hundreds of CPU cycles can be wasted in the name of memory access. Moreover, recall spatial and latency issues and you will come up with idea... ?
  • 13. #>LLPDhttp://www.epam.com/ What about contemporary technologies?What about contemporary technologies? We need more registers! They are based on faster SRAM! - Well modern processors got ones with really ubiquitous names. But not so much as you might think. And you should say now... -Why? ?
  • 14. #>LLPDhttp://www.epam.com/ What about contemporary technologies?What about contemporary technologies? Awareness of code about the huge register file will complicate the code and its pipeline processing inside the CPU chip. Imagine 4KB of registers where each one has its own name/index. The actual answer for the previous question is transparent CPU... Cache
  • 15. #>LLPDhttp://www.epam.com/ So... what about cache?So... what about cache? It can be considered as transparent register file based on fast SRAM cells. Actually, there are several levels of cache with different sizes (the more we move away from the CPU core the more latency and register file size become). It looks like this: Or even like this:
  • 16. #>LLPDhttp://www.epam.com/ Cache is transparent?Cache is transparent? Program code doesn't control cache directly. Cache has its own algorithm coded in logical valves. Here are the types: ● Fully-associative (extreme case) ● Direct mapped (extreme case) ● N-way associative (the one which if frequently used in practice)
  • 17. #>LLPDhttp://www.epam.com/ Fully-associative?Fully-associative? Each entry from underlying external memory may be written to each cache entry. Example: (I/D)TLB. ADDRESS TAG TAG TAG TAG DATA DATA DATA DATA
  • 18. #>LLPDhttp://www.epam.com/ Direct mapped?Direct mapped? Each entry from underlying external memory may be written to the only one corresponding cache entry. Address is divided into three parts now. TAG TAG TAG TAG DATA DATA DATA DATA ADDRESS TAG INDEX OFFSET
  • 19. #>LLPDhttp://www.epam.com/ N-associative?N-associative? Each entry from underlying external memory may be written to one of entries which is in the group of N entries. Combination of fully-associative and direct mapped cache. Address is divided into three parts as well. Examples: modern L1 and L2 data/instruction caches. TAG TAG TAG TAG DATA DATA DATA DATA ADDRESS TAG INDEX OFFSET TAG DATA TAG DATA 2-way associative
  • 20. #>LLPDhttp://www.epam.com/ Basically, what does cache provide?Basically, what does cache provide? It helps code if code maintains temporal locality. Its new beast which is about re-using data which was used earlier. Example: limit as exit condition in loop. Moreover, it helps with spatial locality because cache entry has particular size and entry can be filled in background in read-ahead manner. In the meantime, code in pipeline can proceed with its execution.
  • 21. #>LLPDhttp://www.epam.com/ What is boiled down outcome about DRAM andWhat is boiled down outcome about DRAM and cache?cache? Read/write data sequentially. Re-use data which has been just used. Help cache with read-ahead. Keep data pieces which are used together as close as possible to each other. Use in-lining for functions which performance is crucial. Homework: think about matrix multiplication optimization.
  • 22. #>LLPDhttp://www.epam.com/ May you tell us about multi-core and multi-processorMay you tell us about multi-core and multi-processor configurations?configurations? Yep.
  • 24. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. The simplest one: static int I; //first static buf[] = “123”; //second static struct { int a; int b } str; //third First and third will come from .bss. The second will come from .data.
  • 25. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. More “dynamic” in nature (with stack): void f() { int I; //first buf[] = “123”; //second struct { int a; int b } str; //third } First and third will be uninitialized and nothing will be done for their initialization but extending stack frame (adding sizeof( … ) to SP). Second will be derived with frame extending with MOV instruction. Be careful with even userspace stack. For main thread it's just ~4MB long by default.
  • 26. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. Even more “dynamic”: void f() { int *I = alloca( sizeof( int ) * 10 ); } Expands memory frame further. Still think about limited amount of available stack memory.
  • 27. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. But what if we can't predict amount of data and this amount can be fairly big? We need heap allocator: void f( size_t s ) { int *I = malloc( sizeof( int ) * s ); } Get block from the heap.
  • 28. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. - It's so obvious! Why do you even tell us about this stuff? - Really? Then, please, answer, why several implementations of dynamic allocator exists? What about allocation algorithms (first-fit, best-fit, buddy-list). - Oh. Which ones? The most popular allocators are: ● tcalloc (the one for scalable high-load systems) ● je_malloc (scalable allocator as well as tcalloc; default allocator in freebsd, netbsd) ● ptmalloc (scalable version of dlmalloc for SMP systems; libc6) ● dlmalloc (good at embedded/mobile systems or applications)
  • 29. #>LLPDhttp://www.epam.com/ Memory management from userspace.Memory management from userspace. Algorithms of oblivious memory allocators seems heuristic. They don't have any idea about numerical characteristics of particular usage case. How can we hint allocator about our intents and preferences? Moreover, how can we optimize and decrease frequency of object initialization/destruction? We don't need it each time we want to use object for short time. The answer was delivered by Jeff Bonwick and introduced in the Solaris 2.4 kernel.
  • 30. #>LLPDhttp://www.epam.com/ And...?And...? Jeff Bonwick introduced SLAB allocator. Slab is simple object cache where you can put to and get from already allocated and appropriately initialized structures. The fact is that you know exact size of crucial data structures in your code. If you need a lot of such structures and need to allocate it dynamically you will definitely extract benefit from usage of SLAB. What are users SLAB in these latter days?... It's Linux kernel!
  • 31. #>LLPDhttp://www.epam.com/ Linux kernel...?Linux kernel...? Here is main course: struct kmem_cache * kmem_cache_create(const char *name, size_t size, size_t align, unsigned long flags, void (*ctor)(void *)); void * kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags); kmem_cache_free(task_struct_cachep, tsk); kmem_cache_destroy(task_struct_cachep);
  • 32. #>LLPDhttp://www.epam.com/ How kernel stuff relates to userspace?How kernel stuff relates to userspace? Such approach can be useful in userspace as well. Especially, it's true for object-oriented languages. It was considered and there are several ready-for-use implementations: ● Apache Runtime Library has slab-like machinery ● Glib has g_slice ● C++ Boost has pool And... I've written another one and named it as libmempool. https://github.com/buffovich/libmempool
  • 33. #>LLPDhttp://www.epam.com/ What? Another one? Who do you think you are? For whatWhat? Another one? Who do you think you are? For what reason?reason? Just few points: ● Fully statically configurable for big/small memory footprint; for being thread-safe or not (if it's not then a few bytes-per-cache will be saved and there will be no lock/unlock overhead); for being colored or not (a few bytes for slab can be saved) ● Project is not attended to particular library/framework/environment; you can use it absolutely separately; ● Thread-safety ● Optional reference counting for block ● Statically configured allocation backends (tcalloc/ptalloc/dlalloc/je_malloc) ● Constructors/destructors/”recyclers” support
  • 34. #>LLPDhttp://www.epam.com/ Sounds tasty... Something else?Sounds tasty... Something else? Yep: ● Ready-for-run makefile ● Doxygen documentation
  • 35. #>LLPDhttp://www.epam.com/ What about maturity and unit-tests?What about maturity and unit-tests? I knew you ask me that. ● Unfortunately, there are no unit-tests yet; it will be implemented, of course; test plan is ready. ● Moreover, you may notice the absence of cache coloring; it will be implemented also. ● In extreme cases (a huge amount of structures are going to be allocated/deallocated), allocation backend might commit noticeable percentage to the performance graphs. Going to implement “raw” backend based on POSIX mmap.
  • 36. #>LLPDhttp://www.epam.com/ Thanks to ...Thanks to ... Denis Pynkin (EPAM Systems) Artem Sheremet (EPAM Systems)
  • 38. #>LLPDhttp://www.epam.com/ ...Joking :-)...Joking :-) Ask your questions, please!Ask your questions, please!