SlideShare une entreprise Scribd logo
1  sur  76
Télécharger pour lire hors ligne
RCU in 2019
(from my perspective related to
work I have been doing)
Credits
RCU is the great decades-long work of Paul Mckenney and
others. I am relatively new on the scene (~ 1.5 years).
Who am I ; and how I got started with RCU?
● Worked on Linux for a decade or so.
● Heard only 5 people understand RCU. Opportunity !!!!
● Tired of reading RCU traces and logs that made no sense.
● Was running into RCU concepts but didn’t know their meaning.
Time to put all mysteries to and end…
● Turned out I actually love it and it is like puzzle solving.
● Helping community / company with RCU issues, concepts,
improvements, reviewing.
● Started coming up with RCU questions (~1.5 years ago) about why
things could not be done a certain way to which PaulMck quickly
reminded me that they are “correct” but... RCU is complex since it
has to “survive” in the real-world.
Who am I ; and how I got started with RCU?
PaulMck says…
“Here is your nice elegant
little algorithm”
Who am I ; and how I got started with RCU?
PaulMck says…
“Here is your nice elegant
little algorithm equipped
to survive in the Linux
kernel”
Who am I ; and how I got started with RCU?
Agenda
● Introduction
● RCU Flavor consolidation
○ Performance
○ Scheduler Deadlock fixes
● List RCU API improvements (lockdep)
Extra material covered if time permitting:
○ kfree_rcu() batching
○ Dynticks and NOHZ_FULL fixes
Intro: Typical RCU workflow
● Say you have some data that you have to share between a reader/writer
section.
struct shared_data {
int a;
long b;
};
int reader(struct shared_data *sd) { int writer(struct shared_data *sd) {
if (sd->a) sd->b = 1;
return sd->b; sd->a = 2;
return 0; }
}
Intro: Typical RCU workflow
● One way is to use a reader-writer lock.
int reader(struct shared_data *sd) { void writer(struct shared_data *sd) {
read_lock(&sd->rwlock); write_lock(&sd->rwlock);
if (sd->a) sd->b = 1;
ret = sd->b; sd->a = 2;
read_unlock(&sd->rwlock); write_unlock(&sd->rwlock);
return ret; }
}
Intro: Typical RCU workflow
int reader() { void writer() {
struct shared_data *sd, *old_sd;
struct shared_data sd = spin_lock(&sd->lock);
rcu_dereference(global_sd); old_sd = rcu_dereference(global_sd);
rcu_read_unlock(); // READ sd = kmalloc(sizeof(struct shared_data);
if (sd->a) *sd = *old_sd; // COPY
ret = sd->b; sd->a = 2;
rcu_read_unlock(); rcu_assign_pointer(global_sd, sd); // UPDATE
spin_unlock(&sd->lock);
synchronize_rcu();
return ret; kfree(old_sd);
} }
● Or, use RCU:
Intro: Typical RCU workflow
● Writer makes a copy of data and modifies it.
● Readers execute in parallel on unmodified data.
● To know all prior readers are done, writer:
○ commit the modified copy (pointer)
○ Waits for a grace period.
○ After the wait, destroy old data.
○ Writer’s wait marks the start of a grace period.
● Quiescent state (QS) concept - a state that the CPU passes through which signifies
that the CPU is no longer in a read side critical section (tick, context switch, idle
loop, use code, etc).
○ All CPUs must report passage through a QS.
○ Once all CPUs are accounted for, the grace period can end.
Intro: What’s RCU good for -summary
● Fastest Read-mostly synchronization primitive:
○ Readers are almost free regardless of number of concurrent ones. [Caveats]
○ Writes are costly but per-update cost is amortized.
● 1000s or millions of updates can share a grace period, so cost of
grace period cycle (updating shared state etc) is divided.
○ Read and Writer sections execute in parallel.
[Caveats about ‘literally free”]
o Slide 12: "Readers are almost free": Yes, but caveats do apply:
(1) CONFIG_PREEMPT=n is free but =y has a small cost (especially unlock()). (2) Possibility of rcu_dereference()
affecting compiler optimizations. (3) Given that rcu_read_lock()
and rcu_read_unlock() now imply barrier(), they can also affect compiler optimizations.
Intro: What is RCU and what its RCU good for ?
● More use cases:
○ Wait for completion primitive (used in deadlock free NMI registration, BPF maps)
○ Reference counter replacements
○ Per-cpu reference counting
○ Optimized reader-writer locking (percpu_rwsem)
■ Slow / Fast path switch (rcu-sync)
○ Go through RCUUsage paper for many many usage patterns.
Toy #1 based on ClassicRCU (Docs: WhatIsRCU.txt)
Classic RCU (works only on PREEMPT=n kernels):
#define rcu_dereference(p) READ_ONCE(p);
#define rcu_assign_pointer(p, v) smp_store_release(&(p), (v));
void rcu_read_lock(void) { }
void rcu_read_unlock(void) { }
void synchronize_rcu(void)
{
int cpu;
for_each_possible_cpu(cpu)
run_on(cpu);
}
Only talking about TREE_RCU today
TREE_RCU is the most complex and widely used flavor of RCU.
If you are claiming that I am worrying unnecessarily, you are
probably right. But if I didn't worry unnecessarily, RCU wouldn't
work at all!
— Paul McKenney
There’s also TINY RCU but not discussing it.
Intro: How TREE_RCU works?
qsmask: 1 1
CPU 0 CPU 1
grpmask: 1 0
qsmask: 1 1
CPU 2 CPU 3
grpmask: 0 1
qsmask: 1 1
TREE_RCU example: Initial State of the tree
qsmask: 1 0
CPU 0 CPU 1
grpmask: 1 0
qsmask: 1 1
CPU 2 CPU 3
grpmask: 0 1
qsmask: 1 1
TREE_RCU example: CPU 1 reports QS
qsmask: 1 0
CPU 0 CPU 1
grpmask: 1 0
qsmask: 1 0
CPU 2 CPU 3
grpmask: 0 1
qsmask: 1 1
TREE_RCU example: CPU 3 reports QS
(Notice that the 2 QS updates have proceeded without any synchronization needed)
qsmask: 0 0
CPU 0 CPU 1
grpmask: 1 0
qsmask: 1 0
CPU 2 CPU 3
grpmask: 0 1
qsmask: 0 1
TREE_RCU example: CPU 0 reports QS
(Now there has been an update at the root node)
qsmask: 0 0
CPU 0 CPU 1
grpmask: 1 0
qsmask: 0 0
CPU 2 CPU 3
grpmask: 0 1
qsmask: 0 0
TREE_RCU example: CPU 2 reports QS
(Another update at the root node to mark qs has globally completed -- notice that only 2 global updates
were needed instead of 4. On a system with 1000s of CPUs, this will be at most 64)
Intro: Components of TREE RCU (normal grace period)
GP Thread
(rcu_preempt or
rcu_sched)
CPU 0 CPU 1 CPU 2 CPU 3
softirq softirq softirq softirq
timer timer timer timer
Intro: Life Cycle of a grace period
Waiting for a new
GP request
Force Quiescent
State (FQS) loop
(rcu_gp_fqs_loop)
Are ALL QS marked?
(root node qs_mask == 0)
Mark and
Propagate GP end
down tree
(rcu_gp_cleanup sets
gp_seq of rcu_state, all
nodes )
Queue wake up
callback
(rcu_segcblist_enqueue)
Request a new GP
(rcu_start_this_gp)
Sleep
Continue
Softirq
CB exec
Propagate QS up
TREE
Mark CPU QS
All CPUs done?
(Set Root node qsmask = 0)
ForidleCPUs
For idle CPUs
Once CPU notices GP is done
(rcu_pending() in the tick path
rcu_seq_current(&rnp->gp_seq) != rdp->gp_seq)
Is a GP in
progress?
Tick Softirq
WriterGP thread
synchronize_rcu
Wake up
LengthofaGP
(Caller’sview)
Propagate start of
GP down the TREE
(rcu_gp_init)
Intro: Where/When is Quiescent state reported
● Transition to / from CPU idle and user mode (a.k.a dyntick-idle transitions)
● Context switch path (If read-lock nesting is 0) -- NOTE: reports a CPU QS but task still blocks GP.
● In the Scheduler tick:
○ Task is not running in IRQ/preempt disable (no more sched/bh readers)
○ Read-lock nesting is 0 (no more preemptible rcu readers)
● In the FQS loop
○ Forcing quiescent states for idle (nohz_idle), usermode (nohz_full), and offline CPUs.
○ Forcing quiescent states for CPUs that do dyntick-idle transitions.
○ Including dyntick idle transitions faked with rcu_momentary_dyntick_idle() such as by
cond_resched() on PREEMPT=n kernels.
○ Give soft hints to CPU’s scheduler tick to resched task (rdp.rcu_urgent_qs)
Intro: Which paths do QS reports happen from?
● CPU local update is cheap:
○ Tick path.
○ Context Switch.
○ Help provided from rcu_read_unlock() if needed.
○ Reporting of a deferred QS reporting (when rcu_read_unlock() could not help).
● Global update of CPU QS -- Expensive, but REQUIRED for GP to end:
○ Softirq (once it sees CPU local QS from tick)
○ CPU going online (rcu_cpu_starting)
○ Grace period thread’s FQS loop:
■ dyntick-idle QS report (Expensive, involves using atomic instructions)
● dyntick-idle transitions (Kernel - > user ; Kernel -> idle)
● cond_resched() in PREEMPT=n kernels does a forced dyntick-idle transition (HACK).
Intro: What happens in softirq ?
CPU 0
softirq
timer
Per-CPU Work:
● QS reporting for CPU and propagate up tree.
● Invoke any callbacks whose GP has completed.
Caveat about callbacks queued on offline CPUs:
PaulMck says:
> And yes, callbacks do migrate away from non-offloaded CPUs that go
> offline. But that is not the common case outside of things like
> rcutorture.
Task is in kernel
mode.
Intro: Grace Period has started,what’s RCU upto?
At around 100ms:
GP THREAD
Sched-Tick
Set Per-CPU
urgent_qs flag
Set task’s
need_resched
flag.
Enter
scheduler
Report
QS
( Note: Scheduler entry can happen either in next TICK or next preempt_enable() )
RCU Read-side
critical section
Intro: Grace Period has started,what’s RCU upto?
At around 200ms:
GP THREAD
cond_resched()
Request help
from
cond_resched()
for PREEMPT=n
(by setting
Per-cpu
need_heavy_qs
flag)
Report
QS
RCU Read-side
critical section
Intro: Grace Period has started,what’s RCU upto?
At around 300ms for NOHZ_FULL CPUs:
GP THREAD
Send IPIs to
CPUs that are
still holding up
RCU read-side
critical section
IPI handler
Turns on TICK
SCHED TICK
SCHED TICK
Report
QS
RCU Read-side
critical section
Intro: Grace Period has started,what’s RCU upto?
At around 1 second of start of GP:
SCHED TICK
Set task’s
need_qs flag
Report QS
Different RCU“flavors”
There are 3 main “flavors” of RCU in the kernel:
Entry into RCU read-side critical section:
1. “sched” flavor:
a. rcu_read_lock_sched();
b. preempt_disable();
c. local_irq_disable();
d. IRQ entry.
2. “BH” flavor:
a. rcu_read_lock_bh();
b. local_bh_disable();
c. SoftIRQ entry.
3. “Preemptible” flavor only in CONFIG_PREEMPT kernels
a. rcu_read_lock();
RCU Flavor Consolidation: Why? Reduce APIs
Problem:
1. Too many APIs for synchronization. Confusion over which one to use!
a. For preempt flavor: call_rcu() and synchronize_rcu().
b. For sched: call_rcu_sched() and synchronize_rcu_sched().
c. For bh flavor: call_rcu_bh() and call_rcu_bh().
2. Duplication of RCU state machine for each flavor …
Now after flavor consolidation: Just call_rcu() and synchronize_rcu().
RCU Flavor Consolidation: Why? Changes to rcu_state
3 rcu_state structures before (v4.19)
● rcu_preempt
● rcu_sched
● rcu_bh
Now just 1 rcu_state structure to handle
all flavors:
● rcu_preempt (Also handles sched
and bh)
Number of GP threads also reduced
from 3 to 1 since state machines have
been merged now.
● Less resources!
● Less code!
RCU Flavor Consolidation: Performance Changes
rcuperf test suite
● Starts N readers and N writers on N CPUs
● Each reader and writer affined to a CPU
● Readers just do rcu_read_lock/rcu_read_unlock in a loop
● Writers start and end grace periods by calling synchronize_rcu repeatedly.
● Writers measure time before/after calling synchronize_rcu.
Locally modified test to do on reserved CPU continuously:
preempt_disable + busy_wait + preempt_enable in a loop
Note: This is an additional reader-section variant now (sched-RCU) running in
parallel to the existing rcu_read_lock readers (preemptible-RCU).
What could be the expected Results?
RCU Flavor Consolidation
Performance Changes
This is still within RCU specification!
Also note that disabling preemption for so
long is most not acceptable by most
people anyway.
RCU Flavor Consolidation
Notice that synchronize_rcu time was 2x the preempt_disable time, that’s cos:
synchronize_rcu Wait synchronize_rcu Wait
|<-------------------->| |--------------------|
v v v v
<----------> <----------> <----------> <---------> <--------->
GP GP GP GP GP
GP = long preempt disable duration
Consolidated RCU -The different cases to handle
Say RCU requested special help from the reader section unlock that is holding
up a GP for too long….
preempt_disable();
rcu_read_lock();
do_some_long_activity(); // TICK sets per-task ->need_qs bit
rcu_read_unlock(); // Now need special processing from
// rcu_read_unlock_special()
preempt_enable();
RCU-preempt reader nested in RCU-sched
Before:
preempt_disable();
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock()
-> rcu_read_unlock_special(); // Report the QS
preempt_enable();
Now:
preempt_disable();
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock();
-> rcu_read_unlock_special(); // Defer the QS and set
// rcu_read_unlock_special.deferred_qs
// bit & set TIF_NEED_RESCHED
preempt_enable(); // Report the QS
Consolidated RCU -The
different cases to handle
RCU-preempt reader nested in RCU-bh
Before:
local_bh_disable(); /* or softirq entry */
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock()
-> rcu_read_unlock_special(); // Report the QS
local_bh_enable(); /* softirq exit */
Now:
local_bh_disable(); /* or softirq entry */
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock();
-> rcu_read_unlock_special(); // Defer the QS and set
// rcu_read_unlock_special.deferred_qs
// bit & set TIF_NEED_RESCHED
local_bh_enable(); /* softirq exit */ // Report the QS
Consolidated RCU -The
different cases to handle
RCU-preempt reader nested in local_irq_disable (IRQoff)
(This is a special case where previous reader requested
deferred special processing by setting ->deferred_qs bit)
Before:
local_irq_disable();
rcu_read_lock();
rcu_read_unlock()
-> rcu_read_unlock_special(); // Report the QS
local_irq_enable();
Now:
local_irq_disable();
rcu_read_lock();
rcu_read_unlock();
-> rcu_read_unlock_special(); // Defer the QS and set
// rcu_read_unlock_special.deferred_qs
// bit & set TIF_NEED_RESCHED
local_irq_enable(); // CANNOT Report the QS, still deferred.
Consolidated RCU -The
different cases to handle
RCU-preempt reader nested in RCU-sched (IRQoff)
(This is a special case where previous reader requested
deferred special processing by setting ->deferred_qs bit)
Before:
/* hardirq entry */
rcu_read_lock();
rcu_read_unlock()
-> rcu_read_unlock_special(); // Report the QS
/* hardirq exit */
Now:
/* hardirq entry */
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock();
-> rcu_read_unlock_special(); // Defer the QS and set
// rcu_read_unlock_special.deferred_qs
// bit & set TIF_NEED_RESCHED
/* hardirq exit */ // Report the QS
Consolidated RCU -The
different cases to handle
RCU-bh reader nested in RCU-preempt reader
Before:
/* hardirq entry */
rcu_read_lock();
rcu_read_unlock()
-> rcu_read_unlock_special(); // Report the QS
/* hardirq exit */
Now:
/* hardirq entry */
rcu_read_lock();
do_some_long_activity();
rcu_read_unlock();
-> rcu_read_unlock_special(); // Defer the QS and set
// rcu_read_unlock_special.deferred_qs
// bit & set TIF_NEED_RESCHED
/* hardirq exit */ // Report the QS
Consolidated RCU -The
different cases to handle
RCU-bh reader nested in RCU-preempt or
RCU-sched
Before:
preempt_disable();
/* Interrupt arrives */
/* Raises softirq */
/* Interrupt exits */
__do_softirq();
-> rcu_bh_qs(); /* Reports a BH QS */
preempt_enable();
Now:
preempt_disable();
/* Interrupt arrives */
/* Raises softirq */
/* Interrupt exits */
__do_softirq(); /* Do nothing -- preemption still disabled */
preempt_enable();
Consolidated RCU -The
different cases to handle
Solution: In case of denial of attack, ksoftirqd’s loop with report QS.
No reader sections expected there:
See commit: d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched and
RCU-preempt when safe")
Consolidated RCU -The different cases to handle
Consolidated RCU -Fixing scheduler deadlocks...
The forbidden scheduler rule… This is NOT allowed (https://lwn.net/Articles/453002/)
“Thou shall not hold RQ/PI locks across an rcu_read_unlock() if thou not
holding it or disabling IRQ across both both the rcu_read_lock() +
rcu_read_unlock()”
IRQ
Disable section
(say due to rq/pi
lock)
Time
rcu_read_lock
Section
(irq off initially)
Consolidated RCU -Fixing scheduler deadlocks...
The forbidden scheduler rule… This is allowed
IRQ
Disable section
(say due to rq/pi
lock)
rcu_read_lock
section
Time
IRQ
Disable section
(say due to rq/pi
lock)
Time
rcu_read_lock
section
But we have a new problem… Consider case: future rcu_read_unlock_special() might be
called due to a previous one being deferred.
previous_reader()
{
rcu_read_lock();
do_something(); /* Preemption happened here (so need help from rcu_read_unlock_special. */
local_irq_disable(); /* Cannot be the scheduler as we discussed! */
do_something_else();
rcu_read_unlock(); // As IRQs are off, defer QS report but set deferred_qs bit in rcu_read_unlock_special
do_some_other_thing();
local_irq_enable();
}
current_reader() /* QS from previous_reader() is still deferred. */
{
local_irq_disable(); /* Might be the scheduler. */
do_whatever();
rcu_read_lock();
do_whatever_else();
rcu_read_unlock(); /* Must still defer reporting QS once again but safely! */
do_whatever_comes_to_mind();
local_irq_enable();
}
Consolidated RCU -Fixing scheduler deadlocks...
Consolidated RCU -Fixing scheduler deadlocks...
Fixed in commit: 23634eb (“rcu: Check for wakeup-safe conditions in
rcu_read_unlock_special()”)
Solution: Intro rcu_read_unlock_special.b.deferred_qs bit. (Which is set in previous_reader() in previous example).
Raise softirq from _special() only when either of following are true:
● in_irq() (later changed to in_interrupt) - because ksoftirqd wake-up impossible.
● deferred_qs is set which happens in previous_reader() in previous example.
This makes the softirq raising not wake ksoftirqd thus avoiding a scheduler deadlock.
Made detailed notes on scheduler deadlocks:
https://people.kernel.org/joelfernandes/making-sense-of-scheduler-deadlocks-in-rcu
https://lwn.net/Articles/453002/
CONFIG_PROVE_RCU does“built-in”deref check
rcu_read_lock();
// rcu_dereference checks if rcu_read_lock was called
x = rcu_dereference(y);
rcu_read_unlock();
However this can mislead it...
spin_lock(&my_lock);
// Perfectly legal but still splats!
x = rcu_dereference(y);
spin_unlock(&my_lock);
So there’s an API to support such dereference:
spin_lock(&my_lock);
x = rcu_dereference_protected(y, lockdep_is_held(&my_lock));
spin_unlock(&my_lock);
List RCU usage -hardening
● listRCU APIs: list_for_each_entry_rcu() don’t do any checking.
● Can hide subtle RCU related bugs such as list corruption.
List RCU usage -Example of issue,consider...
/*
* NOTE: acpi_map_lookup() must be called with rcu_read_lock() or spinlock protection.
* But nothing checks for that, list corruption will go unnoticed...
*/
struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->phys <= phys &&
phys + size <= map->phys + map->size)
return map;
}
List RCU usage -hardening
● Option 1: Introduce new APIs that do checking for each RCU “flavor” :
○ list_for_each_entry_rcu()
○ list_for_each_entry_rcu_bh()
○ list_for_each_entry_rcu_sched()
● And for when locking protection is used instead or reader protection:
○ list_for_each_entry_rcu_protected()
○ list_for_each_entry_rcu_bh_protected()
○ list_for_each_entry_rcu_sched_protected()
● Drawback:
○ Proliferation of APIs...
List RCU usage -hardening
● Option 2: Can do better since RCU backend is now consolidated.
○ So checking whether ANY flavor of RCU reader is held, is sufficient in
list_for_each_entry_rcu()
■ sched
■ bh
■ preemptible RCU
○ And add an optional argument to list_for_each_entry_rcu() to pass a lockdep
expression; so no need for a list_for_each_entry_rcu_protected().
● One API to rule it all !!!!
List RCU usage -hardening
Example usage showing optional expression when lock held: acpi_map_lookup():
#define acpi_ioremap_lock_held() lockdep_is_held(&acpi_ioremap_lock)
struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size)
{
struct acpi_ioremap *map;
list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held())
if (map->phys <= phys &&
phys + size <= map->phys + map->size)
return map;
}
List RCU usage -hardening
Patches:
https://lore.kernel.org/lkml/20190716184656.GK14271@linux.ibm.com/T/#t
https://lore.kernel.org/patchwork/project/lkml/list/?series=401865
Next steps:
● Make CONFIG_PROVE_RCU_LIST the default and get some more testing
Future work
● More Torture testing on arm64 hardware
● Re-design dynticks counters to keep simple
● List RCU checking updates
● RCU scheduler deadlock checking
● Reducing grace periods due to kfree_rcu().
● Make possible to not embed rcu_head in object
● More RCU testing, experiment with modeling etc.
● More systematic study of __rcu sparse checking.
Thank you…
● For questions, please email the list: rcu@vger.kernel.org
● Follow us on Twitter:
○ paulmckrcu@ joel_linux@ boqun_feng@ srostedt@
kfree_rcu() improvements
Problem: Too many kfree_rcu() calls can overwhelm RCU
● Concerns around vhost driver calling kfree_rcu often:
○ https://lore.kernel.org/lkml/20190721131725.GR14271@linux.ibm.com/
○ https://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org/
● Similar bug was fixed by avoiding kfree_rcu() altogether in access()
○ https://lore.kernel.org/lkml/20190729190816.523626281@linuxfoundation.org/
Can we improve RCU to do better and avoid future such issues?
Current kfree_rcu() design (TODO: Add state diagram)
kfree_rcu -> queue call back -> wait for GP -> run kfree from RCU core
● Flooding kfree_rcu() can cause lot of RCU work and grace periods.
● All kfree() happens from softirq as RCU callbacks can execute from softirq
○ Unless nocb or nohz_full is passed of course, still running more callbacks means other
RCU work also has to wait.
kfree_rcu() re-design -first pass
kfree_rcu batching:
● 2 per-CPU list (Call these A and B)
● Each kfree_rcu() request queued on a per-CPU list (no RCU)
● Within KFREE_MAX_JIFFIES (using schedule_delayed_work())
○ Dequeue all requests on A, and Queue on B
○ Call queue_rcu_work() to run a worker to free all B-objects after GP
○ Meanwhile new requests can be queued on A
kfree_rcu() re-design -first pass
Advantages:
● 5X reduction in number of grace periods in rcuperf testing:
○ GPs from ~9000 to ~1700 (per-cpu kfree_rcu flood on 16 CPU
system)
● All kfree() happens in workqueue context and systems schedules work on
the different CPUs.
Drawbacks:
● Since list B cannot be queued into until GP ends, list A keeps growing
● Causes high memory consumption (300-350) MB.
kfree_rcu() re-design -first pass
Drawbacks:
● Since list B cannot be queued into until GP, list A keeps growing
● Miss opportunities and has to wait for future GPs on busy system
● Causes higher memory consumption (300-350) MB.
Opportunity missed to run
at start of GP X+2
B is busy, so any attempts to Queue A to B have to be delayed.
So we keep retrying every 1/50th of a second; and A keeps growing…
1/50th sec 1/50th sec
GP X start
B busy,
grow A
B busy,
grow A
B free, queue A to
B & run after GP B’s objs are reclaimed at start of GP X+3
GP X+1 start GP X+2 start GP X+3 start
kfree_rcu() re-design -second pass
● Introduce a new list C.
○ If B is busy just queue on C and schedule that after GP
Opportunity used to run at
start of GP X+2
GP X start
B busy,
Queue on C B’s objs are reclaimed at start of GP X+2 now!
Thus saving a grace period.
GP X+1 start GP X+2 start GP X+3 start
● Brings down memory consumption by 100-150MB during kfree_rcu
flood
● Maximize bang for GP-bucks!
dyntick RCU counters : intro (TODO: Condense these 9 slides into one
slide)
● RCU needs to know what state CPUs are in
● If we have scheduler tick, no problem! Tick path tells us.
But…
● NOHZ_FULL can turn off tick in both user mode and idle mode.
● NOHZ_IDLE can turn off tick in idle mode
Implemented with a per-CPU atomic counter:
● rcu_data :: atomic_t dynticks : Even for RCU-idle, odd non-RCU-idle
dyntick RCU counters : intro (TODO: Add state-diagrams)
● RCU transitions non-idle <-> idle:
○ kernel <-> user
○ kernel <-> idle
○ irq <-> user
○ irq <-> idle
Each of these transitions cause rcu_data::dynticks to be incremented.
● RCU remains non-idle:
○ irq <-> nmi
○ nmi <-> irq
○ kernel <-> irq
○ Irq <-> kernel
● Each of these transitions cause rcu_data::dynticks to NOT be incremented.
nohz CPUs tick turn off : intro
Scheduler tick is turn off is attempted:
● From cpu idle loop: tick_nohz_idle_stop_tick();
● At the end of IRQs:
irq_exit()
-> tick_nohz_irq_exit()
-> tick_nohz_full_update_tick()
tick_nohz_full_update_tick() checks tick_dep_mask. If any bits
are set, tick is not turned off.
nohz CPUs : reasons to keep tick ON
Introduced in commit
d027d45d8a17 ("nohz: New tick dependency mask")
● POSIX CPU timers
● Perf events
● Scheduler
○ (for nohz_full, tick may be turned off if RQ has only 1 task)
The problem: nohz_full and RCU (TODO: Diagram)
Consider the scenario:
● A nohz_full CPU with tick turned off is looping in kernel mode.
● The GP kthread on another CPU notices CPU is taking too long.
● Sets per-cpu hints: rcu_urgent_qs to true and sends resched_cpu().
● IPI runs on nohz_full CPU.
● It enters the scheduler, still nothing happens. Why??
● Grace period continues to stall… resulting in OOM.
Reason:
● Entering the scheduler only means the per-CPU quiescent state is
updated
● Does not mean the global view of the CPU’s quiescent state is
updated...
The problem: nohz_full and RCU
What does updating global view of a CPU QS mean?
Either of the following:
● A. Quiescent state propagates up the RCU TREE to the root
● B. The rcu_data :: dynticks counter of the CPU is updated.
○ This causes the FQS loop to do A on dyntick transition of CPU.
Either of these operations will result in a globally updated view of CPU
QS
Both of these are expensive and need to be done carefully.
The problem: nohz_full and RCU
The global view of a CPU QS is updated only from a few places..
● From the RCU softirq (which is raised from scheduler tick)
○ This does A (tick).
● Kernel/IRQ -> Idle / User transition
○ This does B (dyntick transition).
● cond_resched() in PREEMPT=n kernels
○ This does B (dyntick transition).
● Other special cases where a GP must not be stalled and an RCU
reader is not possible: (rcutorture, multi_cpu_stop() etc.)
The problem: nohz_full and RCU
Possible Solution:
Use option B (forced dynticks transition) in the IPI handler.
● This would be rather hackish. Currently we do this only in very
special cases.
● This will only solve the issue for the current GP, future GPs may
still suffer.
The problem: nohz_full and RCU
Final Solution: Turn back the tick on.
● Add a new “RCU” tick dependency mask.
● RCU FQS loop sets rcu_urgent_qs to true and does resched_cpu()
on the stalling CPU as explained.
● IPI handler arrives at CPU.
● On IPI entry, we set tick dependency mask.
● On IPI exit, the sched-tick code turns the tick back on.
This solution prevents the OOM splats.
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=5b451e66c8f7
8f301c0c6eaacb0246eb13e09974
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=fd6d2b116411
753fcaf929e6523d2ada227aa553
dyntick counter cleanup attempts
Why?
In the previous solution, found a bug where dyntick_nmi_nesting was being set to
(DYNTICK_IRQ_NONIDLE | 2)
However, the code was checking (dyntick_nmi_nesting == 2)
Link to buggy commit:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b0759
3977d9288e5d0c21c89d9ba27e2ea1f
However, Paul wants the existing mechanism redesigned rather than cleaned up since it's too
complex: http://lore.kernel.org/r/20190828202330.GS26530@linux.ibm.com
Thank you…
● For questions, please email the list: rcu@vger.kernel.org
● Follow us on Twitter:
○ paulmckrcu@ joel_linux@ boqun_feng@ srostedt@

Contenu connexe

Tendances

LCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted FirmwareLCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted FirmwareLinaro
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingKohei Tokunaga
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelSUSE Labs Taipei
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialDalton Valadares
 
Comparing Next-Generation Container Image Building Tools
 Comparing Next-Generation Container Image Building Tools Comparing Next-Generation Container Image Building Tools
Comparing Next-Generation Container Image Building ToolsAkihiro Suda
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingViller Hsiao
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedAnne Nicolas
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能Kohei Tokunaga
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?Pradeep Kumar
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking ExplainedThomas Graf
 
Tasklet vs work queues (Deferrable functions in linux)
Tasklet vs work queues (Deferrable functions in linux)Tasklet vs work queues (Deferrable functions in linux)
Tasklet vs work queues (Deferrable functions in linux)RajKumar Rampelli
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction PerformanceLF Events
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterKohei Tokunaga
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingBrendan Gregg
 

Tendances (20)

LCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted FirmwareLCU14 500 ARM Trusted Firmware
LCU14 500 ARM Trusted Firmware
 
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy PullingFaster Container Image Distribution on a Variety of Tools with Lazy Pulling
Faster Container Image Distribution on a Variety of Tools with Lazy Pulling
 
ACPI Debugging from Linux Kernel
ACPI Debugging from Linux KernelACPI Debugging from Linux Kernel
ACPI Debugging from Linux Kernel
 
OPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build TutorialOPTEE on QEMU - Build Tutorial
OPTEE on QEMU - Build Tutorial
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Comparing Next-Generation Container Image Building Tools
 Comparing Next-Generation Container Image Building Tools Comparing Next-Generation Container Image Building Tools
Comparing Next-Generation Container Image Building Tools
 
Meet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracingMeet cute-between-ebpf-and-tracing
Meet cute-between-ebpf-and-tracing
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all startedKernel Recipes 2019 - ftrace: Where modifying a running kernel all started
Kernel Recipes 2019 - ftrace: Where modifying a running kernel all started
 
containerdの概要と最近の機能
containerdの概要と最近の機能containerdの概要と最近の機能
containerdの概要と最近の機能
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?QEMU Disk IO Which performs Better: Native or threads?
QEMU Disk IO Which performs Better: Native or threads?
 
Linux Networking Explained
Linux Networking ExplainedLinux Networking Explained
Linux Networking Explained
 
Tasklet vs work queues (Deferrable functions in linux)
Tasklet vs work queues (Deferrable functions in linux)Tasklet vs work queues (Deferrable functions in linux)
Tasklet vs work queues (Deferrable functions in linux)
 
Boost UDP Transaction Performance
Boost UDP Transaction PerformanceBoost UDP Transaction Performance
Boost UDP Transaction Performance
 
Introduction to Linux Drivers
Introduction to Linux DriversIntroduction to Linux Drivers
Introduction to Linux Drivers
 
Linux Internals - Interview essentials 4.0
Linux Internals - Interview essentials 4.0Linux Internals - Interview essentials 4.0
Linux Internals - Interview essentials 4.0
 
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz SnapshotterThe overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
The overview of lazypull with containerd Remote Snapshotter & Stargz Snapshotter
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
 

Similaire à Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes

Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Adrian Huang
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the wayMark Price
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchDaniel Ben-Zvi
 
Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Alexey Ivanov
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory modelSeongJae Park
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster.org
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103Linaro
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Atin Mukherjee
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Chartbeat
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Anne Nicolas
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stackHajime Tazaki
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoSRohit Jnagal
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...PROIDEA
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdfWilliamTom9
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelSeongJae Park
 
Presentation 14 09_2012
Presentation 14 09_2012Presentation 14 09_2012
Presentation 14 09_2012Youcheng Sun
 

Similaire à Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes (20)

Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)Linux Synchronization Mechanism: RCU (Read Copy Update)
Linux Synchronization Mechanism: RCU (Read Copy Update)
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
RCU
RCURCU
RCU
 
OS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switchOS scheduling and The anatomy of a context switch
OS scheduling and The anatomy of a context switch
 
Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...Userspace RCU library : what linear multiprocessor scalability means for your...
Userspace RCU library : what linear multiprocessor scalability means for your...
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016Gluster d thread_synchronization_using_urcu_lca2016
Gluster d thread_synchronization_using_urcu_lca2016
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016Glusterd_thread_synchronization_using_urcu_lca2016
Glusterd_thread_synchronization_using_urcu_lca2016
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
 
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
Kernel Recipes 2018 - KernelShark 1.0; What's new and what's coming - Steven ...
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Playing BBR with a userspace network stack
Playing BBR with a userspace network stackPlaying BBR with a userspace network stack
Playing BBR with a userspace network stack
 
Memory Bandwidth QoS
Memory Bandwidth QoSMemory Bandwidth QoS
Memory Bandwidth QoS
 
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
CONFidence 2017: Escaping the (sand)box: The promises and pitfalls of modern ...
 
pipelining ppt.pdf
pipelining ppt.pdfpipelining ppt.pdf
pipelining ppt.pdf
 
An Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux KernelAn Introduction to the Formalised Memory Model for Linux Kernel
An Introduction to the Formalised Memory Model for Linux Kernel
 
Presentation 14 09_2012
Presentation 14 09_2012Presentation 14 09_2012
Presentation 14 09_2012
 

Plus de Anne Nicolas

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstAnne Nicolas
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIAnne Nicolas
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelAnne Nicolas
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyAnne Nicolas
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureAnne Nicolas
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Anne Nicolas
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataAnne Nicolas
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Anne Nicolas
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxAnne Nicolas
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialAnne Nicolas
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconAnne Nicolas
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureAnne Nicolas
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayAnne Nicolas
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerAnne Nicolas
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationAnne Nicolas
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingAnne Nicolas
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaAnne Nicolas
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPAnne Nicolas
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Anne Nicolas
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyAnne Nicolas
 

Plus de Anne Nicolas (20)

Kernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream firstKernel Recipes 2019 - Driving the industry toward upstream first
Kernel Recipes 2019 - Driving the industry toward upstream first
 
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMIKernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
Kernel Recipes 2019 - No NMI? No Problem! – Implementing Arm64 Pseudo-NMI
 
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernelKernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
Kernel Recipes 2019 - Hunting and fixing bugs all over the Linux kernel
 
Kernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are moneyKernel Recipes 2019 - Metrics are money
Kernel Recipes 2019 - Metrics are money
 
Kernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and futureKernel Recipes 2019 - Kernel documentation: past, present, and future
Kernel Recipes 2019 - Kernel documentation: past, present, and future
 
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
Embedded Recipes 2019 - Knowing your ARM from your ARSE: wading through the t...
 
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary dataKernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
 
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
Kernel Recipes 2019 - Analyzing changes to the binary interface exposed by th...
 
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and BareboxEmbedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
Embedded Recipes 2019 - Remote update adventures with RAUC, Yocto and Barebox
 
Embedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less specialEmbedded Recipes 2019 - Making embedded graphics less special
Embedded Recipes 2019 - Making embedded graphics less special
 
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre SiliconEmbedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
Embedded Recipes 2019 - Linux on Open Source Hardware and Libre Silicon
 
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) pictureEmbedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
Embedded Recipes 2019 - From maintaining I2C to the big (embedded) picture
 
Embedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops wayEmbedded Recipes 2019 - Testing firmware the devops way
Embedded Recipes 2019 - Testing firmware the devops way
 
Embedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmakerEmbedded Recipes 2019 - Herd your socs become a matchmaker
Embedded Recipes 2019 - Herd your socs become a matchmaker
 
Embedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integrationEmbedded Recipes 2019 - LLVM / Clang integration
Embedded Recipes 2019 - LLVM / Clang integration
 
Embedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debuggingEmbedded Recipes 2019 - Introduction to JTAG debugging
Embedded Recipes 2019 - Introduction to JTAG debugging
 
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimediaEmbedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
Embedded Recipes 2019 - Pipewire a new foundation for embedded multimedia
 
Kernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDPKernel Recipes 2019 - Suricata and XDP
Kernel Recipes 2019 - Suricata and XDP
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 
Kernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easyKernel Recipes 2019 - Formal modeling made easy
Kernel Recipes 2019 - Formal modeling made easy
 

Dernier

SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxRTS corp
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Data modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainData modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainAbdul Ahad
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotEdgard Alejos
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 

Dernier (20)

SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptxThe Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
The Role of IoT and Sensor Technology in Cargo Cloud Solutions.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Data modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software DomainData modeling 101 - Basics - Software Domain
Data modeling 101 - Basics - Software Domain
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Copilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform CopilotCopilot para Microsoft 365 y Power Platform Copilot
Copilot para Microsoft 365 y Power Platform Copilot
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 

Kernel Recipes 2019 - RCU in 2019 - Joel Fernandes

  • 1. RCU in 2019 (from my perspective related to work I have been doing)
  • 2. Credits RCU is the great decades-long work of Paul Mckenney and others. I am relatively new on the scene (~ 1.5 years).
  • 3. Who am I ; and how I got started with RCU? ● Worked on Linux for a decade or so. ● Heard only 5 people understand RCU. Opportunity !!!! ● Tired of reading RCU traces and logs that made no sense. ● Was running into RCU concepts but didn’t know their meaning. Time to put all mysteries to and end… ● Turned out I actually love it and it is like puzzle solving.
  • 4. ● Helping community / company with RCU issues, concepts, improvements, reviewing. ● Started coming up with RCU questions (~1.5 years ago) about why things could not be done a certain way to which PaulMck quickly reminded me that they are “correct” but... RCU is complex since it has to “survive” in the real-world. Who am I ; and how I got started with RCU?
  • 5. PaulMck says… “Here is your nice elegant little algorithm” Who am I ; and how I got started with RCU?
  • 6. PaulMck says… “Here is your nice elegant little algorithm equipped to survive in the Linux kernel” Who am I ; and how I got started with RCU?
  • 7. Agenda ● Introduction ● RCU Flavor consolidation ○ Performance ○ Scheduler Deadlock fixes ● List RCU API improvements (lockdep) Extra material covered if time permitting: ○ kfree_rcu() batching ○ Dynticks and NOHZ_FULL fixes
  • 8. Intro: Typical RCU workflow ● Say you have some data that you have to share between a reader/writer section. struct shared_data { int a; long b; }; int reader(struct shared_data *sd) { int writer(struct shared_data *sd) { if (sd->a) sd->b = 1; return sd->b; sd->a = 2; return 0; } }
  • 9. Intro: Typical RCU workflow ● One way is to use a reader-writer lock. int reader(struct shared_data *sd) { void writer(struct shared_data *sd) { read_lock(&sd->rwlock); write_lock(&sd->rwlock); if (sd->a) sd->b = 1; ret = sd->b; sd->a = 2; read_unlock(&sd->rwlock); write_unlock(&sd->rwlock); return ret; } }
  • 10. Intro: Typical RCU workflow int reader() { void writer() { struct shared_data *sd, *old_sd; struct shared_data sd = spin_lock(&sd->lock); rcu_dereference(global_sd); old_sd = rcu_dereference(global_sd); rcu_read_unlock(); // READ sd = kmalloc(sizeof(struct shared_data); if (sd->a) *sd = *old_sd; // COPY ret = sd->b; sd->a = 2; rcu_read_unlock(); rcu_assign_pointer(global_sd, sd); // UPDATE spin_unlock(&sd->lock); synchronize_rcu(); return ret; kfree(old_sd); } } ● Or, use RCU:
  • 11. Intro: Typical RCU workflow ● Writer makes a copy of data and modifies it. ● Readers execute in parallel on unmodified data. ● To know all prior readers are done, writer: ○ commit the modified copy (pointer) ○ Waits for a grace period. ○ After the wait, destroy old data. ○ Writer’s wait marks the start of a grace period. ● Quiescent state (QS) concept - a state that the CPU passes through which signifies that the CPU is no longer in a read side critical section (tick, context switch, idle loop, use code, etc). ○ All CPUs must report passage through a QS. ○ Once all CPUs are accounted for, the grace period can end.
  • 12. Intro: What’s RCU good for -summary ● Fastest Read-mostly synchronization primitive: ○ Readers are almost free regardless of number of concurrent ones. [Caveats] ○ Writes are costly but per-update cost is amortized. ● 1000s or millions of updates can share a grace period, so cost of grace period cycle (updating shared state etc) is divided. ○ Read and Writer sections execute in parallel. [Caveats about ‘literally free”] o Slide 12: "Readers are almost free": Yes, but caveats do apply: (1) CONFIG_PREEMPT=n is free but =y has a small cost (especially unlock()). (2) Possibility of rcu_dereference() affecting compiler optimizations. (3) Given that rcu_read_lock() and rcu_read_unlock() now imply barrier(), they can also affect compiler optimizations.
  • 13. Intro: What is RCU and what its RCU good for ? ● More use cases: ○ Wait for completion primitive (used in deadlock free NMI registration, BPF maps) ○ Reference counter replacements ○ Per-cpu reference counting ○ Optimized reader-writer locking (percpu_rwsem) ■ Slow / Fast path switch (rcu-sync) ○ Go through RCUUsage paper for many many usage patterns.
  • 14. Toy #1 based on ClassicRCU (Docs: WhatIsRCU.txt) Classic RCU (works only on PREEMPT=n kernels): #define rcu_dereference(p) READ_ONCE(p); #define rcu_assign_pointer(p, v) smp_store_release(&(p), (v)); void rcu_read_lock(void) { } void rcu_read_unlock(void) { } void synchronize_rcu(void) { int cpu; for_each_possible_cpu(cpu) run_on(cpu); }
  • 15. Only talking about TREE_RCU today TREE_RCU is the most complex and widely used flavor of RCU. If you are claiming that I am worrying unnecessarily, you are probably right. But if I didn't worry unnecessarily, RCU wouldn't work at all! — Paul McKenney There’s also TINY RCU but not discussing it.
  • 17. qsmask: 1 1 CPU 0 CPU 1 grpmask: 1 0 qsmask: 1 1 CPU 2 CPU 3 grpmask: 0 1 qsmask: 1 1 TREE_RCU example: Initial State of the tree
  • 18. qsmask: 1 0 CPU 0 CPU 1 grpmask: 1 0 qsmask: 1 1 CPU 2 CPU 3 grpmask: 0 1 qsmask: 1 1 TREE_RCU example: CPU 1 reports QS
  • 19. qsmask: 1 0 CPU 0 CPU 1 grpmask: 1 0 qsmask: 1 0 CPU 2 CPU 3 grpmask: 0 1 qsmask: 1 1 TREE_RCU example: CPU 3 reports QS (Notice that the 2 QS updates have proceeded without any synchronization needed)
  • 20. qsmask: 0 0 CPU 0 CPU 1 grpmask: 1 0 qsmask: 1 0 CPU 2 CPU 3 grpmask: 0 1 qsmask: 0 1 TREE_RCU example: CPU 0 reports QS (Now there has been an update at the root node)
  • 21. qsmask: 0 0 CPU 0 CPU 1 grpmask: 1 0 qsmask: 0 0 CPU 2 CPU 3 grpmask: 0 1 qsmask: 0 0 TREE_RCU example: CPU 2 reports QS (Another update at the root node to mark qs has globally completed -- notice that only 2 global updates were needed instead of 4. On a system with 1000s of CPUs, this will be at most 64)
  • 22. Intro: Components of TREE RCU (normal grace period) GP Thread (rcu_preempt or rcu_sched) CPU 0 CPU 1 CPU 2 CPU 3 softirq softirq softirq softirq timer timer timer timer
  • 23. Intro: Life Cycle of a grace period Waiting for a new GP request Force Quiescent State (FQS) loop (rcu_gp_fqs_loop) Are ALL QS marked? (root node qs_mask == 0) Mark and Propagate GP end down tree (rcu_gp_cleanup sets gp_seq of rcu_state, all nodes ) Queue wake up callback (rcu_segcblist_enqueue) Request a new GP (rcu_start_this_gp) Sleep Continue Softirq CB exec Propagate QS up TREE Mark CPU QS All CPUs done? (Set Root node qsmask = 0) ForidleCPUs For idle CPUs Once CPU notices GP is done (rcu_pending() in the tick path rcu_seq_current(&rnp->gp_seq) != rdp->gp_seq) Is a GP in progress? Tick Softirq WriterGP thread synchronize_rcu Wake up LengthofaGP (Caller’sview) Propagate start of GP down the TREE (rcu_gp_init)
  • 24. Intro: Where/When is Quiescent state reported ● Transition to / from CPU idle and user mode (a.k.a dyntick-idle transitions) ● Context switch path (If read-lock nesting is 0) -- NOTE: reports a CPU QS but task still blocks GP. ● In the Scheduler tick: ○ Task is not running in IRQ/preempt disable (no more sched/bh readers) ○ Read-lock nesting is 0 (no more preemptible rcu readers) ● In the FQS loop ○ Forcing quiescent states for idle (nohz_idle), usermode (nohz_full), and offline CPUs. ○ Forcing quiescent states for CPUs that do dyntick-idle transitions. ○ Including dyntick idle transitions faked with rcu_momentary_dyntick_idle() such as by cond_resched() on PREEMPT=n kernels. ○ Give soft hints to CPU’s scheduler tick to resched task (rdp.rcu_urgent_qs)
  • 25. Intro: Which paths do QS reports happen from? ● CPU local update is cheap: ○ Tick path. ○ Context Switch. ○ Help provided from rcu_read_unlock() if needed. ○ Reporting of a deferred QS reporting (when rcu_read_unlock() could not help). ● Global update of CPU QS -- Expensive, but REQUIRED for GP to end: ○ Softirq (once it sees CPU local QS from tick) ○ CPU going online (rcu_cpu_starting) ○ Grace period thread’s FQS loop: ■ dyntick-idle QS report (Expensive, involves using atomic instructions) ● dyntick-idle transitions (Kernel - > user ; Kernel -> idle) ● cond_resched() in PREEMPT=n kernels does a forced dyntick-idle transition (HACK).
  • 26. Intro: What happens in softirq ? CPU 0 softirq timer Per-CPU Work: ● QS reporting for CPU and propagate up tree. ● Invoke any callbacks whose GP has completed. Caveat about callbacks queued on offline CPUs: PaulMck says: > And yes, callbacks do migrate away from non-offloaded CPUs that go > offline. But that is not the common case outside of things like > rcutorture.
  • 27. Task is in kernel mode. Intro: Grace Period has started,what’s RCU upto? At around 100ms: GP THREAD Sched-Tick Set Per-CPU urgent_qs flag Set task’s need_resched flag. Enter scheduler Report QS ( Note: Scheduler entry can happen either in next TICK or next preempt_enable() )
  • 28. RCU Read-side critical section Intro: Grace Period has started,what’s RCU upto? At around 200ms: GP THREAD cond_resched() Request help from cond_resched() for PREEMPT=n (by setting Per-cpu need_heavy_qs flag) Report QS RCU Read-side critical section
  • 29. Intro: Grace Period has started,what’s RCU upto? At around 300ms for NOHZ_FULL CPUs: GP THREAD Send IPIs to CPUs that are still holding up RCU read-side critical section IPI handler Turns on TICK SCHED TICK SCHED TICK Report QS
  • 30. RCU Read-side critical section Intro: Grace Period has started,what’s RCU upto? At around 1 second of start of GP: SCHED TICK Set task’s need_qs flag Report QS
  • 31. Different RCU“flavors” There are 3 main “flavors” of RCU in the kernel: Entry into RCU read-side critical section: 1. “sched” flavor: a. rcu_read_lock_sched(); b. preempt_disable(); c. local_irq_disable(); d. IRQ entry. 2. “BH” flavor: a. rcu_read_lock_bh(); b. local_bh_disable(); c. SoftIRQ entry. 3. “Preemptible” flavor only in CONFIG_PREEMPT kernels a. rcu_read_lock();
  • 32. RCU Flavor Consolidation: Why? Reduce APIs Problem: 1. Too many APIs for synchronization. Confusion over which one to use! a. For preempt flavor: call_rcu() and synchronize_rcu(). b. For sched: call_rcu_sched() and synchronize_rcu_sched(). c. For bh flavor: call_rcu_bh() and call_rcu_bh(). 2. Duplication of RCU state machine for each flavor … Now after flavor consolidation: Just call_rcu() and synchronize_rcu().
  • 33. RCU Flavor Consolidation: Why? Changes to rcu_state 3 rcu_state structures before (v4.19) ● rcu_preempt ● rcu_sched ● rcu_bh Now just 1 rcu_state structure to handle all flavors: ● rcu_preempt (Also handles sched and bh) Number of GP threads also reduced from 3 to 1 since state machines have been merged now. ● Less resources! ● Less code!
  • 34. RCU Flavor Consolidation: Performance Changes rcuperf test suite ● Starts N readers and N writers on N CPUs ● Each reader and writer affined to a CPU ● Readers just do rcu_read_lock/rcu_read_unlock in a loop ● Writers start and end grace periods by calling synchronize_rcu repeatedly. ● Writers measure time before/after calling synchronize_rcu. Locally modified test to do on reserved CPU continuously: preempt_disable + busy_wait + preempt_enable in a loop Note: This is an additional reader-section variant now (sched-RCU) running in parallel to the existing rcu_read_lock readers (preemptible-RCU). What could be the expected Results?
  • 35. RCU Flavor Consolidation Performance Changes This is still within RCU specification! Also note that disabling preemption for so long is most not acceptable by most people anyway.
  • 36. RCU Flavor Consolidation Notice that synchronize_rcu time was 2x the preempt_disable time, that’s cos: synchronize_rcu Wait synchronize_rcu Wait |<-------------------->| |--------------------| v v v v <----------> <----------> <----------> <---------> <---------> GP GP GP GP GP GP = long preempt disable duration
  • 37. Consolidated RCU -The different cases to handle Say RCU requested special help from the reader section unlock that is holding up a GP for too long…. preempt_disable(); rcu_read_lock(); do_some_long_activity(); // TICK sets per-task ->need_qs bit rcu_read_unlock(); // Now need special processing from // rcu_read_unlock_special() preempt_enable();
  • 38. RCU-preempt reader nested in RCU-sched Before: preempt_disable(); rcu_read_lock(); do_some_long_activity(); rcu_read_unlock() -> rcu_read_unlock_special(); // Report the QS preempt_enable(); Now: preempt_disable(); rcu_read_lock(); do_some_long_activity(); rcu_read_unlock(); -> rcu_read_unlock_special(); // Defer the QS and set // rcu_read_unlock_special.deferred_qs // bit & set TIF_NEED_RESCHED preempt_enable(); // Report the QS Consolidated RCU -The different cases to handle
  • 39. RCU-preempt reader nested in RCU-bh Before: local_bh_disable(); /* or softirq entry */ rcu_read_lock(); do_some_long_activity(); rcu_read_unlock() -> rcu_read_unlock_special(); // Report the QS local_bh_enable(); /* softirq exit */ Now: local_bh_disable(); /* or softirq entry */ rcu_read_lock(); do_some_long_activity(); rcu_read_unlock(); -> rcu_read_unlock_special(); // Defer the QS and set // rcu_read_unlock_special.deferred_qs // bit & set TIF_NEED_RESCHED local_bh_enable(); /* softirq exit */ // Report the QS Consolidated RCU -The different cases to handle
  • 40. RCU-preempt reader nested in local_irq_disable (IRQoff) (This is a special case where previous reader requested deferred special processing by setting ->deferred_qs bit) Before: local_irq_disable(); rcu_read_lock(); rcu_read_unlock() -> rcu_read_unlock_special(); // Report the QS local_irq_enable(); Now: local_irq_disable(); rcu_read_lock(); rcu_read_unlock(); -> rcu_read_unlock_special(); // Defer the QS and set // rcu_read_unlock_special.deferred_qs // bit & set TIF_NEED_RESCHED local_irq_enable(); // CANNOT Report the QS, still deferred. Consolidated RCU -The different cases to handle
  • 41. RCU-preempt reader nested in RCU-sched (IRQoff) (This is a special case where previous reader requested deferred special processing by setting ->deferred_qs bit) Before: /* hardirq entry */ rcu_read_lock(); rcu_read_unlock() -> rcu_read_unlock_special(); // Report the QS /* hardirq exit */ Now: /* hardirq entry */ rcu_read_lock(); do_some_long_activity(); rcu_read_unlock(); -> rcu_read_unlock_special(); // Defer the QS and set // rcu_read_unlock_special.deferred_qs // bit & set TIF_NEED_RESCHED /* hardirq exit */ // Report the QS Consolidated RCU -The different cases to handle
  • 42. RCU-bh reader nested in RCU-preempt reader Before: /* hardirq entry */ rcu_read_lock(); rcu_read_unlock() -> rcu_read_unlock_special(); // Report the QS /* hardirq exit */ Now: /* hardirq entry */ rcu_read_lock(); do_some_long_activity(); rcu_read_unlock(); -> rcu_read_unlock_special(); // Defer the QS and set // rcu_read_unlock_special.deferred_qs // bit & set TIF_NEED_RESCHED /* hardirq exit */ // Report the QS Consolidated RCU -The different cases to handle
  • 43. RCU-bh reader nested in RCU-preempt or RCU-sched Before: preempt_disable(); /* Interrupt arrives */ /* Raises softirq */ /* Interrupt exits */ __do_softirq(); -> rcu_bh_qs(); /* Reports a BH QS */ preempt_enable(); Now: preempt_disable(); /* Interrupt arrives */ /* Raises softirq */ /* Interrupt exits */ __do_softirq(); /* Do nothing -- preemption still disabled */ preempt_enable(); Consolidated RCU -The different cases to handle
  • 44. Solution: In case of denial of attack, ksoftirqd’s loop with report QS. No reader sections expected there: See commit: d28139c4e967 ("rcu: Apply RCU-bh QSes to RCU-sched and RCU-preempt when safe") Consolidated RCU -The different cases to handle
  • 45. Consolidated RCU -Fixing scheduler deadlocks... The forbidden scheduler rule… This is NOT allowed (https://lwn.net/Articles/453002/) “Thou shall not hold RQ/PI locks across an rcu_read_unlock() if thou not holding it or disabling IRQ across both both the rcu_read_lock() + rcu_read_unlock()” IRQ Disable section (say due to rq/pi lock) Time rcu_read_lock Section (irq off initially)
  • 46. Consolidated RCU -Fixing scheduler deadlocks... The forbidden scheduler rule… This is allowed IRQ Disable section (say due to rq/pi lock) rcu_read_lock section Time IRQ Disable section (say due to rq/pi lock) Time rcu_read_lock section
  • 47. But we have a new problem… Consider case: future rcu_read_unlock_special() might be called due to a previous one being deferred. previous_reader() { rcu_read_lock(); do_something(); /* Preemption happened here (so need help from rcu_read_unlock_special. */ local_irq_disable(); /* Cannot be the scheduler as we discussed! */ do_something_else(); rcu_read_unlock(); // As IRQs are off, defer QS report but set deferred_qs bit in rcu_read_unlock_special do_some_other_thing(); local_irq_enable(); } current_reader() /* QS from previous_reader() is still deferred. */ { local_irq_disable(); /* Might be the scheduler. */ do_whatever(); rcu_read_lock(); do_whatever_else(); rcu_read_unlock(); /* Must still defer reporting QS once again but safely! */ do_whatever_comes_to_mind(); local_irq_enable(); } Consolidated RCU -Fixing scheduler deadlocks...
  • 48. Consolidated RCU -Fixing scheduler deadlocks... Fixed in commit: 23634eb (“rcu: Check for wakeup-safe conditions in rcu_read_unlock_special()”) Solution: Intro rcu_read_unlock_special.b.deferred_qs bit. (Which is set in previous_reader() in previous example). Raise softirq from _special() only when either of following are true: ● in_irq() (later changed to in_interrupt) - because ksoftirqd wake-up impossible. ● deferred_qs is set which happens in previous_reader() in previous example. This makes the softirq raising not wake ksoftirqd thus avoiding a scheduler deadlock. Made detailed notes on scheduler deadlocks: https://people.kernel.org/joelfernandes/making-sense-of-scheduler-deadlocks-in-rcu https://lwn.net/Articles/453002/
  • 49. CONFIG_PROVE_RCU does“built-in”deref check rcu_read_lock(); // rcu_dereference checks if rcu_read_lock was called x = rcu_dereference(y); rcu_read_unlock();
  • 50. However this can mislead it... spin_lock(&my_lock); // Perfectly legal but still splats! x = rcu_dereference(y); spin_unlock(&my_lock);
  • 51. So there’s an API to support such dereference: spin_lock(&my_lock); x = rcu_dereference_protected(y, lockdep_is_held(&my_lock)); spin_unlock(&my_lock);
  • 52. List RCU usage -hardening ● listRCU APIs: list_for_each_entry_rcu() don’t do any checking. ● Can hide subtle RCU related bugs such as list corruption.
  • 53. List RCU usage -Example of issue,consider... /* * NOTE: acpi_map_lookup() must be called with rcu_read_lock() or spinlock protection. * But nothing checks for that, list corruption will go unnoticed... */ struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map; list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) if (map->phys <= phys && phys + size <= map->phys + map->size) return map; }
  • 54. List RCU usage -hardening ● Option 1: Introduce new APIs that do checking for each RCU “flavor” : ○ list_for_each_entry_rcu() ○ list_for_each_entry_rcu_bh() ○ list_for_each_entry_rcu_sched() ● And for when locking protection is used instead or reader protection: ○ list_for_each_entry_rcu_protected() ○ list_for_each_entry_rcu_bh_protected() ○ list_for_each_entry_rcu_sched_protected() ● Drawback: ○ Proliferation of APIs...
  • 55. List RCU usage -hardening ● Option 2: Can do better since RCU backend is now consolidated. ○ So checking whether ANY flavor of RCU reader is held, is sufficient in list_for_each_entry_rcu() ■ sched ■ bh ■ preemptible RCU ○ And add an optional argument to list_for_each_entry_rcu() to pass a lockdep expression; so no need for a list_for_each_entry_rcu_protected(). ● One API to rule it all !!!!
  • 56. List RCU usage -hardening Example usage showing optional expression when lock held: acpi_map_lookup(): #define acpi_ioremap_lock_held() lockdep_is_held(&acpi_ioremap_lock) struct acpi_ioremap * acpi_map_lookup(acpi_physical_address phys, acpi_size size) { struct acpi_ioremap *map; list_for_each_entry_rcu(map, &acpi_ioremaps, list, acpi_ioremap_lock_held()) if (map->phys <= phys && phys + size <= map->phys + map->size) return map; }
  • 57. List RCU usage -hardening Patches: https://lore.kernel.org/lkml/20190716184656.GK14271@linux.ibm.com/T/#t https://lore.kernel.org/patchwork/project/lkml/list/?series=401865 Next steps: ● Make CONFIG_PROVE_RCU_LIST the default and get some more testing
  • 58. Future work ● More Torture testing on arm64 hardware ● Re-design dynticks counters to keep simple ● List RCU checking updates ● RCU scheduler deadlock checking ● Reducing grace periods due to kfree_rcu(). ● Make possible to not embed rcu_head in object ● More RCU testing, experiment with modeling etc. ● More systematic study of __rcu sparse checking.
  • 59. Thank you… ● For questions, please email the list: rcu@vger.kernel.org ● Follow us on Twitter: ○ paulmckrcu@ joel_linux@ boqun_feng@ srostedt@
  • 60. kfree_rcu() improvements Problem: Too many kfree_rcu() calls can overwhelm RCU ● Concerns around vhost driver calling kfree_rcu often: ○ https://lore.kernel.org/lkml/20190721131725.GR14271@linux.ibm.com/ ○ https://lore.kernel.org/lkml/20190723035725-mutt-send-email-mst@kernel.org/ ● Similar bug was fixed by avoiding kfree_rcu() altogether in access() ○ https://lore.kernel.org/lkml/20190729190816.523626281@linuxfoundation.org/ Can we improve RCU to do better and avoid future such issues?
  • 61. Current kfree_rcu() design (TODO: Add state diagram) kfree_rcu -> queue call back -> wait for GP -> run kfree from RCU core ● Flooding kfree_rcu() can cause lot of RCU work and grace periods. ● All kfree() happens from softirq as RCU callbacks can execute from softirq ○ Unless nocb or nohz_full is passed of course, still running more callbacks means other RCU work also has to wait.
  • 62. kfree_rcu() re-design -first pass kfree_rcu batching: ● 2 per-CPU list (Call these A and B) ● Each kfree_rcu() request queued on a per-CPU list (no RCU) ● Within KFREE_MAX_JIFFIES (using schedule_delayed_work()) ○ Dequeue all requests on A, and Queue on B ○ Call queue_rcu_work() to run a worker to free all B-objects after GP ○ Meanwhile new requests can be queued on A
  • 63. kfree_rcu() re-design -first pass Advantages: ● 5X reduction in number of grace periods in rcuperf testing: ○ GPs from ~9000 to ~1700 (per-cpu kfree_rcu flood on 16 CPU system) ● All kfree() happens in workqueue context and systems schedules work on the different CPUs. Drawbacks: ● Since list B cannot be queued into until GP ends, list A keeps growing ● Causes high memory consumption (300-350) MB.
  • 64. kfree_rcu() re-design -first pass Drawbacks: ● Since list B cannot be queued into until GP, list A keeps growing ● Miss opportunities and has to wait for future GPs on busy system ● Causes higher memory consumption (300-350) MB. Opportunity missed to run at start of GP X+2 B is busy, so any attempts to Queue A to B have to be delayed. So we keep retrying every 1/50th of a second; and A keeps growing… 1/50th sec 1/50th sec GP X start B busy, grow A B busy, grow A B free, queue A to B & run after GP B’s objs are reclaimed at start of GP X+3 GP X+1 start GP X+2 start GP X+3 start
  • 65. kfree_rcu() re-design -second pass ● Introduce a new list C. ○ If B is busy just queue on C and schedule that after GP Opportunity used to run at start of GP X+2 GP X start B busy, Queue on C B’s objs are reclaimed at start of GP X+2 now! Thus saving a grace period. GP X+1 start GP X+2 start GP X+3 start ● Brings down memory consumption by 100-150MB during kfree_rcu flood ● Maximize bang for GP-bucks!
  • 66. dyntick RCU counters : intro (TODO: Condense these 9 slides into one slide) ● RCU needs to know what state CPUs are in ● If we have scheduler tick, no problem! Tick path tells us. But… ● NOHZ_FULL can turn off tick in both user mode and idle mode. ● NOHZ_IDLE can turn off tick in idle mode Implemented with a per-CPU atomic counter: ● rcu_data :: atomic_t dynticks : Even for RCU-idle, odd non-RCU-idle
  • 67. dyntick RCU counters : intro (TODO: Add state-diagrams) ● RCU transitions non-idle <-> idle: ○ kernel <-> user ○ kernel <-> idle ○ irq <-> user ○ irq <-> idle Each of these transitions cause rcu_data::dynticks to be incremented. ● RCU remains non-idle: ○ irq <-> nmi ○ nmi <-> irq ○ kernel <-> irq ○ Irq <-> kernel ● Each of these transitions cause rcu_data::dynticks to NOT be incremented.
  • 68. nohz CPUs tick turn off : intro Scheduler tick is turn off is attempted: ● From cpu idle loop: tick_nohz_idle_stop_tick(); ● At the end of IRQs: irq_exit() -> tick_nohz_irq_exit() -> tick_nohz_full_update_tick() tick_nohz_full_update_tick() checks tick_dep_mask. If any bits are set, tick is not turned off.
  • 69. nohz CPUs : reasons to keep tick ON Introduced in commit d027d45d8a17 ("nohz: New tick dependency mask") ● POSIX CPU timers ● Perf events ● Scheduler ○ (for nohz_full, tick may be turned off if RQ has only 1 task)
  • 70. The problem: nohz_full and RCU (TODO: Diagram) Consider the scenario: ● A nohz_full CPU with tick turned off is looping in kernel mode. ● The GP kthread on another CPU notices CPU is taking too long. ● Sets per-cpu hints: rcu_urgent_qs to true and sends resched_cpu(). ● IPI runs on nohz_full CPU. ● It enters the scheduler, still nothing happens. Why?? ● Grace period continues to stall… resulting in OOM. Reason: ● Entering the scheduler only means the per-CPU quiescent state is updated ● Does not mean the global view of the CPU’s quiescent state is updated...
  • 71. The problem: nohz_full and RCU What does updating global view of a CPU QS mean? Either of the following: ● A. Quiescent state propagates up the RCU TREE to the root ● B. The rcu_data :: dynticks counter of the CPU is updated. ○ This causes the FQS loop to do A on dyntick transition of CPU. Either of these operations will result in a globally updated view of CPU QS Both of these are expensive and need to be done carefully.
  • 72. The problem: nohz_full and RCU The global view of a CPU QS is updated only from a few places.. ● From the RCU softirq (which is raised from scheduler tick) ○ This does A (tick). ● Kernel/IRQ -> Idle / User transition ○ This does B (dyntick transition). ● cond_resched() in PREEMPT=n kernels ○ This does B (dyntick transition). ● Other special cases where a GP must not be stalled and an RCU reader is not possible: (rcutorture, multi_cpu_stop() etc.)
  • 73. The problem: nohz_full and RCU Possible Solution: Use option B (forced dynticks transition) in the IPI handler. ● This would be rather hackish. Currently we do this only in very special cases. ● This will only solve the issue for the current GP, future GPs may still suffer.
  • 74. The problem: nohz_full and RCU Final Solution: Turn back the tick on. ● Add a new “RCU” tick dependency mask. ● RCU FQS loop sets rcu_urgent_qs to true and does resched_cpu() on the stalling CPU as explained. ● IPI handler arrives at CPU. ● On IPI entry, we set tick dependency mask. ● On IPI exit, the sched-tick code turns the tick back on. This solution prevents the OOM splats. https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=5b451e66c8f7 8f301c0c6eaacb0246eb13e09974 https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=fd6d2b116411 753fcaf929e6523d2ada227aa553
  • 75. dyntick counter cleanup attempts Why? In the previous solution, found a bug where dyntick_nmi_nesting was being set to (DYNTICK_IRQ_NONIDLE | 2) However, the code was checking (dyntick_nmi_nesting == 2) Link to buggy commit: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=13c4b0759 3977d9288e5d0c21c89d9ba27e2ea1f However, Paul wants the existing mechanism redesigned rather than cleaned up since it's too complex: http://lore.kernel.org/r/20190828202330.GS26530@linux.ibm.com
  • 76. Thank you… ● For questions, please email the list: rcu@vger.kernel.org ● Follow us on Twitter: ○ paulmckrcu@ joel_linux@ boqun_feng@ srostedt@