SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Load-Balancing for Improving user Responsiveness
on Multicore Embedded Systems
Jul-11, 2012
Geunsik Lim
Samsung Electronics Co., Ltd.
Sungkyungkwan University
2/24
Who am I ?
• Full name: Geunsik Lim
• E-Mail : geunsik.lim@samsung.com, leemgs@gmail.com
• Current : Senior software engineer at Samsung Electronics (http://www.samsung.com)
• Android localization: Korea Android community (http://www.kandorid.org)
• Past: S/W membership manager at Samsung Electronics
Senior engineer at ROBOST company
Systems administrator at Daegu Bank, Ltd.
South
Korea
Ottawa
3/24
1. Introduction
2. Existing methods
3. Operation zone based load-balancer
4. Evaluation
5. Further work
6. Conclusions
TOC
4/24
SMP Scheduler(Load-balancing) : scheduler( ), load_balance( ), migration_thread( )
Synchronization : Semaphore, Spin-Lock, FUTEX, Atomic op., Per-CPU variable, RCU, Work-Queue
Interrupt Load-balancing ( or user-space level irqbalance daemon)
Affinity (Interface to protect the movement of tasks into another CPU for system administrator)
• CPU Affinity (Shielded CPU)
• I/O Affinity
• IRQ Affinity
CPUSET(with Process Container; cgroups): Assign CPU and Memory on NUMA
CPU Isolation: Isolate a specific CPU (If you don‟t need Load-balancing)
Tasks
Tasks
Multi-core
Parallelism
Load-balancing
Introduction – Linux Features for Multicore
5/24
• 2.6.00 SMP scalability (Per-CPU data structures)
• 2.6.16 SMP IRQ affinity
• 2.6.24 CPU isolation
• 2.6.28 Block: add support for IO CPU affinity
• 2.6.32 Enable rq CPU completion affinity by default (speeds up significantly databases)
• 2.6.33 Includes full support for ARM9 MPCore
• 2.6.37 Outdated Big Kernel Lock (BKL) technology
• 2.6.38 Improve cpu-cgroup performance for smp systems significantly by rewriting tg_shares_up
• 2.6.39 Ext4 SMP scalability - SMP speed-ups
• 3.1.00 Block: Dynamic writeback throttling - SMP scaling problem fixed , Strict CPU affinity,
• 3.4.00 Memory resource controller (with cgroups)
Latest Linux have the matured SMP features
• 2.6.15: SMP support for ARM11 MPCore
• 2.6.18: SMPnice
• 2.6.36: Support for S5PV310 (ARM Cortex-A9 Multi-Core)
The major features for ARM is merged into mainline Kernel.
Change-logs of Linux Kernel for SMP and ARM.
Introduction – SMP Linux
Up-to-date
6/24
Considerable Problem Solution
Avoiding destruction of sharing
resource according to Concurrent
workers (e.g. Writers)
Use Locking mechanism. (e.g: kernel lock facilities,
app level thread library)
Synchronization overhead Increase or decrease parallel level suitably.
Task Migration Adjust Affinity manually. (ideal OS will schedule
tasks automatically)
Resource Contention Operate well-programmed s/w, well-designed OS
scheduler like Cgroups. Utilize sched_yield( )
False sharing Allocated data into cache line size ( via compiler
ASAP).
Routines used by many agents Implement thread-safe and re-entrant software
Cache line depending task migration
(e.g. Ping-pong effect)
Affinitize tasks to a specific CPU
Unfair cache request case. Affinitize tasks to a specific CPU
Introduction – Considerable Factors for SMP Environment
7/24
Related work - CPU Affinity Policy
This technique affinitize specific tasks into some CPUs to avoid load-balancing
operation
• Apparatus and method for improved CPU affinity in a multiprocessor systemRA Alfieri - US Patent 5,745,778, 1998, Citations 167
• Affinity scheduling of processes on symmetric multiprocessing systemsKD Abramson, HB Butts Jr… - US Patent 5,506,987, 1996
• Migration policies for multi-core fair-share scheduling, D Choffnes, M Astley,ACM SIGOPS Operating Systems, 2008
8/24
Related work - Classification of RT & NRT tasks
This technique isolates a time-critical tasks into a specific CPU physically.
• Shielded CPUs: real-time performance in standard Linux, ecee.colorado.edu, S Brosky, Linux Journal, 2004, Citations 11
• Shielded processors: Guaranteeing sub-millisecond response in standard Linux, S Brosky, Parallel and Distributed Processing, 2003
• A real-time Linux, V Yodaiken, Proceedings of the Linux Applications, 1997, Citations 167
9/24
Related work - A Partitioning method for Multi-processor
• Container-based operating system virtualization: a scalable,
high-performance alternative to hypervisors, S Soltesz, H Pötzl,
ME Fiuczynsk, ACM SIGOPS, 2007 , Citations 169
•Task partitioning: An innovation process variable, Eric von Hippel,
MIT Sloan School of Management, Cambridge, MA 02139, U.S.A.,
1 April 2002.
•Process Partitioning for Distributed Embedded Systems, CODES
'96 Proceedings of the 4th International Workshop on
Hardware/Software Co-Design, 1996
These techniques schedule by grouping/partitioning for tasks‟ goals in kernel space.
10/24
Related work: Load-balancing on Linux for multicore system
• Load balancing operation periodically whenever load imbalance for optimal CPU utilization
• The problems of this mechanism process task migration unnecessarily although the CPU
isn't used as fully as 100%.
• Real-time performance and middleware on multi-core linux platforms, Yuanfang Zhang, Washington University, 2008
• Load balancing control method for a loosely coupled multi-processor system and a device for realizing same, Toshio
Hirosawa, Hitachi, Japan, Patent No. 4748558, May-1-1986
• Improve load balancing when tasks have large weight differential, Nikhil Rao, Google, http://lwn.net/Articles/409860
11/24
Problems of the existing load-balancer
1. Direct cost
• The load-balancing co
st by checking the loa
d imbalance of CPUs f
or utilization and scala
bility in the multicore
system
2. Indirect cost
• Cache invalidation
• Power consumption
3. Latency cost
• Scheduling latency
• Longer non-preempta
ble Period
In general, more CPU load leads to more frequent task migration, and thus, incurs
higher cost. The cost can be broken down into direct, indirect, and latency costs
as follows;
12/24
Operation zone based load-balancer: Task migration time
Figure shows the time that has to inspect the needs of task migration to keep
the CPU load fairly.
(1)
(2)
(3)
13/24
Operation zone based load-balancer : Load-balancing operation zone
• load-balancing operation zone consists of three scheduling-aware control areas.
• "Cold zone" policy may executes load-balancing operation loosely for low CPU utilization system
• "Hot zone" policy must executes load-balancing operation enthusiastically like the existing mechanism
• "Warm zone" policy is located in middle level between "Cold zone" and "Hot zone".
100
90
80
70
60
50
40
30
20
10
0
CPUusage(%)
Hot Zone
Warm Zone
Cold Zone
Fluctuation
Spot
(Always load-balancing)
(No load-balancing)
High spot
Mid spot
Low spot
(No load-balancing)
14/24
Operation zone based load-balancer : Calculating CPU utilization
• Warm Zone consists of three spots based on management system of score.
• Control of tasks isn't simple because CPU utilization of "Warm zone“ policy
occurs fluctuations, Therefore, support Weight-based score management.
Please see the paper for
the detail
Weight-based score management for
Warm zone
Based on Local CPU
(Default policy)
Based on Average CPUs
15/24
Hardware Latency
Interrupt
Per CPU Latency
Interrupt Latency
Preemption Latency
Switching Latency
WakeUp Latency
Latency Factors in Linux Kernel
Misc. Latency
Latency factors in kernel-space
• The major factors that happen latency damage in kernel-space
Scheduling
Latency
16/24 16/10
RT Task
Go to sleep
(1000 usec)
NRT/lower PR Tasks
5,000 usec
RT Task
Go to sleep
(1000 usec)
NRT/lower PR Tasks
5,000 usec Latency
Preemption
latency
Switching
latency
Interrupt
latency
… …Wakeup
latency
…
Evaluation environment
17/24
Evaluation scenario for worst-case
# Evaluate latency of 1 user-space thread with static priority 99
# ps -eo comm,pid,tid,class,rtprio,wchan:35 | grep 99 | awk '{print $2}„
time ./cyclictest ( –a 0 )-t1 -p 99 -i 1000 -n -l 1000000
# Create 50 threads as background tasks.
time ./cyclictest -t50 -p 80 -i 10000 -n -l 100000
# To maximize I/O Load ASAP
cd /opt
tar cvzf test1.tgz ./linux-2.6.X &
tar cvzf test2.tgz ./linux-2.6.X &
tar cvzf test3.tgz ./linux-2.6.X &
tar cvzf test4.tgz ./linux-2.6.X &
# To maximize CPUs Load
/bin/ping -l 100000 -q -s 10 -f localhost &
/bin/ping -l 100000 -q -s 10 -f localhost &
/bin/ping -l 100000 -q -s 10 -f localhost &
/bin/ping -l 100000 -q -s 10 -f localhost &
/bin/ping -l 100000 -q -s 10 -f localhost &
# To get the highest CPU stress with Ingo Molnar’s dohell.
#!/bin/sh
while true; do /bin/dd if=/dev/zero of=bigfile bs=1024000 count=1024; done &
while true; do /usr/bin/killall hackbench; sleep 5; done &
while true; do /sbin/hackbench 20; done &
( cd ./ltp-full-20120401; while true; do ./runalltests.sh -x 40; done & )
Evaluate
scheduling Latency of
a urgent task
Stress conditions
http://rt.wiki.kernel.org# Calculate the usage of disk for CPU & I/O load
/bin/du / &
BACKGROUNDFOREROUND
18/24
Evaluation on CPU affinity based system 1/2
• Test Scenario: Foreground task is affinity (CPU0). Background stress is affinity (CPU1~3).
• Test Environment : Intel Q9400 , Linux 2.6.32
• Test Utilities : LTP-FULL-20120401 , Cyclictest of rt-test package
• Load-balancer setting: With Warm Zone (High spot) Policy
Scheduling latency of our test thread is
reduced more than three times: from 53
microseconds to 16 microseconds on average
19/24
Evaluation on CPU non-affinity based system 2/2
• Test Scenario: Foreground task is affinity (CPU0). Background stress is non-affinity.
• Test Environment : Intel Q9400 , Linux 2.6.32
• Test Utilities : LTP-FULL-20120401 , cyclictest of rt-test package
• Load-balancer setting: With Warm Zone (High spot) Policy
Scheduling latency of our test thread is
reduced more than two times: from 72
microseconds to 31 microseconds on average
20/24
Performance counter stats for 'sync':
3.837029 task-clock # 0.012 CPUs utilized
13 context-switches# 0.003 M/sec
0 CPU-migrations # 0.000 M/sec
140 page-faults # 0.036 M/sec
9,594,609 cycles# 2.501 GHz
<not counted> stalled-cycles-frontend
<not counted> stalled-cycles-backend
2,221,867 instructions # 0.23 insns per cycle
404,846 branches # 105.510 M/sec
14,400 branch-misses # 3.56% of all branches
0.321459666 seconds time elapsed
sync-2389 [001] 325.763989: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.764012: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764076: wakeup: 2389:120:0 ==+ 394:120:0 [002]
sync-2389 [001] 325.764082: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.764089: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764108: wakeup: 2389:120:0 ==+ 2342:120:0 [000]
sync-2389 [001] 325.764116: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764134: wakeup: 2389:120:0 ==+ 2343:120:0 [000]
sync-2389 [001] 325.764136: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764157: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799064: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799200: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799329: context_switch: 2389:120:2 ==> 0:120:0 [002]
sync-2389 [001] 325.799456: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799580: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799661: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.799663: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.917879: wakeup: 2389:120:0 ==+ 394:120:0 [003]
. . . . . Below Omission . . . . . . .
Evaluation - Task migration of sync command
• Test Environment : Android device, Linux 2.6.32
• Test Scenario : Sync (To synchronize files of a storage like micro-sdcard)
• Load-balancer policy: With Warm Zone (Mid spot) Policy
Performance counter stats for 'sync':
3.837029 task-clock # 0.012 CPUs utilized
13 context-switches# 0.003 M/sec
3 CPU-migrations # 0.005 M/sec
140 page-faults # 0.036 M/sec
9,594,609 cycles# 2.501 GHz
<not counted> stalled-cycles-frontend
<not counted> stalled-cycles-backend
2,221,867 instructions # 0.23 insns per cycle
404,846 branches # 105.510 M/sec
14,400 branch-misses # 3.56% of all branches
0.321459666 seconds time elapsed
sync-2389 [001] 325.763989: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.764012: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764076: wakeup: 2389:120:0 ==+ 394:120:0 [002]
sync-2389 [001] 325.764082: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.764089: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764108: wakeup: 2389:120:0 ==+ 2342:120:0 [000]
sync-2389 [001] 325.764116: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764134: wakeup: 2389:120:0 ==+ 2343:120:0 [000]
sync-2389 [001] 325.764136: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.764157: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799064: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799200: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [002] 325.799329: context_switch: 2389:120:2 ==> 0:120:0 [002]
sync-2389 [001] 325.799456: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799580: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.799661: wakeup: 2389:120:0 ==+ 620:120:0 [000]
sync-2389 [001] 325.799663: context_switch: 2389:120:2 ==> 0:120:0 [001]
sync-2389 [001] 325.917879: wakeup: 2389:120:0 ==+ 394:120:0 [003]
. . . . . Below Omission . . . . . . .
Tracing
with
Ftrace
Skip the activity of
unnecessary task migration
for real-time characteristics.
Before After
21/24
Evaluation – Migration Handling of one threaded application
• Test Environment : Android device, Linux 2.6.32
• Test Scenario : CPU intensive process‟s scheduling with one threaded application
• Test Example : tar xvf *** ./
• System Interface: /proc/sys/kernel/balance_one_threaded_app (ON=1, OFF=0)
Time
CPU 0Before
CPU 1
CPU2
CPU 3
95%
94%
89%
91%
86%
92%
97%
91
%
89%
84%
Idle status
Idle status
Idle status
Idle status
Idle status
Idle statusIdle status
Idle status
Idle status
Idle status
Idle status
Idle
status
Start End
CPU 0
After
CPU 1
CPU2
CPU 3
92%
(CPU usage of one process)
Time
Idle status
Idle status
Idle status
Start End
22/24
Further work
• If the deadline guarantee for real-time characteristics in the worst
conditions is very critical for real-time systems, this approach has the
technical limitation to max latency protection of running tasks anytime.
• We need to figure out the best method such as a hybrid design by
mixing our technique and the physical CPU shielding technique.
• To recognize low power consumption of mobile devices, we need further
experimental research to design an ideal algorithm for vital task
migration according to the CPU on-line and the CPU off-line.
• We have to evaluate various scenarios such as direct cost, indirect cost,
and latency cost to improve our load-balancer as a next generation SMP
scheduler.
23/24
Conclusion
• We do not need any modification of user-space because this
approach is the only technique in the operating system.
• Our design reduces non-preemptive intervals that always
generate double-locking cost for task migration among the CPUs.
• Our approach suppress the “task migration” kernel thread which
executes inefficient CPU instructions to move a task to another
CPU
• Our idea pushes cost reduction aggressively regarding CPU cache
invalidation and synchronization cause by the update of local
cache.
24/24
Thank you for your attention!
Any questions?

Contenu connexe

Tendances

A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open libertyAndy Mauer
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Antonio Cesarano
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Masao Fujii
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architecturesIslam Samir
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331Fengchang Xie
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStackPradeep Kumar
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCJohan Tibell
 
Performance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionPerformance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionHaribabu Nandyal Padmanaban
 
Speed up your asset imports for big projects - Unite Copenhagen 2019
Speed up your asset imports for big projects - Unite Copenhagen 2019Speed up your asset imports for big projects - Unite Copenhagen 2019
Speed up your asset imports for big projects - Unite Copenhagen 2019Unity Technologies
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryoguest40fc7cd
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixZongYing Lyu
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Timothy St. Clair
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaAll Things Open
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Surveyijsrd.com
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 

Tendances (20)

Google File System
Google File SystemGoogle File System
Google File System
 
A fun cup of joe with open liberty
A fun cup of joe with open libertyA fun cup of joe with open liberty
A fun cup of joe with open liberty
 
Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...Cluster based storage - Nasd and Google file system - advanced operating syst...
Cluster based storage - Nasd and Google file system - advanced operating syst...
 
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
Streaming Replication (Keynote @ PostgreSQL Conference 2009 Japan)
 
4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures4838281 operating-system-scheduling-on-multicore-architectures
4838281 operating-system-scheduling-on-multicore-architectures
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
google file system
google file systemgoogle file system
google file system
 
Gfs google-file-system-13331
Gfs google-file-system-13331Gfs google-file-system-13331
Gfs google-file-system-13331
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
ClickHouse Keeper
ClickHouse KeeperClickHouse Keeper
ClickHouse Keeper
 
A Scalable I/O Manager for GHC
A Scalable I/O Manager for GHCA Scalable I/O Manager for GHC
A Scalable I/O Manager for GHC
 
ResumeJagannath
ResumeJagannathResumeJagannath
ResumeJagannath
 
Performance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage CollectionPerformance Tuning - Understanding Garbage Collection
Performance Tuning - Understanding Garbage Collection
 
Speed up your asset imports for big projects - Unite Copenhagen 2019
Speed up your asset imports for big projects - Unite Copenhagen 2019Speed up your asset imports for big projects - Unite Copenhagen 2019
Speed up your asset imports for big projects - Unite Copenhagen 2019
 
Threading Successes 03 Gamebryo
Threading Successes 03   GamebryoThreading Successes 03   Gamebryo
Threading Successes 03 Gamebryo
 
Libckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unixLibckpt transparent checkpointing under unix
Libckpt transparent checkpointing under unix
 
Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.Musings on Mesos: Docker, Kubernetes, and Beyond.
Musings on Mesos: Docker, Kubernetes, and Beyond.
 
Tips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache KafkaTips and Tricks for Operating Apache Kafka
Tips and Tricks for Operating Apache Kafka
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
 
On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 

Similaire à load-balancing-method-for-embedded-rt-system-20120711-0940

참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의DzH QWuynh
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304VMUG IT
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EASLinaro
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
Von Neumann Architecture microcontroller.pptx
Von Neumann Architecture microcontroller.pptxVon Neumann Architecture microcontroller.pptx
Von Neumann Architecture microcontroller.pptxSUNILNYATI2
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLinaro
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 
Insider operating system
Insider   operating systemInsider   operating system
Insider operating systemAditi Saxena
 
Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Anwal Mirza
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/AAbdul Munam
 
Aman 16 os sheduling algorithm methods.pptx
Aman 16 os sheduling algorithm methods.pptxAman 16 os sheduling algorithm methods.pptx
Aman 16 os sheduling algorithm methods.pptxvikramkagitapu
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionHemanth Venkatesh
 
Engg-0505-IT-Operating-Systems-2nd-year.pdf
Engg-0505-IT-Operating-Systems-2nd-year.pdfEngg-0505-IT-Operating-Systems-2nd-year.pdf
Engg-0505-IT-Operating-Systems-2nd-year.pdfnikhil287188
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresCloudLightning
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllScyllaDB
 

Similaire à load-balancing-method-for-embedded-rt-system-20120711-0940 (20)

참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의참여기관_발표자료-국민대학교 201301 정기회의
참여기관_발표자료-국민대학교 201301 정기회의
 
CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304CNR @ VMUG.IT 20150304
CNR @ VMUG.IT 20150304
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
BKK16-208 EAS
BKK16-208 EASBKK16-208 EAS
BKK16-208 EAS
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Von Neumann Architecture microcontroller.pptx
Von Neumann Architecture microcontroller.pptxVon Neumann Architecture microcontroller.pptx
Von Neumann Architecture microcontroller.pptx
 
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summitLCU13: Power-efficient scheduling, and the latest news from the kernel summit
LCU13: Power-efficient scheduling, and the latest news from the kernel summit
 
Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
Insider operating system
Insider   operating systemInsider   operating system
Insider operating system
 
Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4Ios103 ios102 iv-operating-system-memory-management_wk4
Ios103 ios102 iv-operating-system-memory-management_wk4
 
Operating system Q/A
Operating system Q/AOperating system Q/A
Operating system Q/A
 
Operating System
Operating SystemOperating System
Operating System
 
Aman 16 os sheduling algorithm methods.pptx
Aman 16 os sheduling algorithm methods.pptxAman 16 os sheduling algorithm methods.pptx
Aman 16 os sheduling algorithm methods.pptx
 
01 oracle architecture
01 oracle architecture01 oracle architecture
01 oracle architecture
 
Real time operating systems
Real time operating systemsReal time operating systems
Real time operating systems
 
Linux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emptionLinux Device Driver parallelism using SMP and Kernel Pre-emption
Linux Device Driver parallelism using SMP and Kernel Pre-emption
 
Engg-0505-IT-Operating-Systems-2nd-year.pdf
Engg-0505-IT-Operating-Systems-2nd-year.pdfEngg-0505-IT-Operating-Systems-2nd-year.pdf
Engg-0505-IT-Operating-Systems-2nd-year.pdf
 
Simulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud InfrastructuresSimulation of Heterogeneous Cloud Infrastructures
Simulation of Heterogeneous Cloud Infrastructures
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Let’s Fix Logging Once and for All
Let’s Fix Logging Once and for AllLet’s Fix Logging Once and for All
Let’s Fix Logging Once and for All
 

Plus de Samsung Electronics

Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)Samsung Electronics
 
kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340Samsung Electronics
 
Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900Samsung Electronics
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700Samsung Electronics
 

Plus de Samsung Electronics (7)

Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)Samsung ARM Chromebook1/2 (for Hackers & System Developers)
Samsung ARM Chromebook1/2 (for Hackers & System Developers)
 
kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340kics2013-winter-biomp-slide-20130127-1340
kics2013-winter-biomp-slide-20130127-1340
 
gcce-uapm-slide-20131001-1900
gcce-uapm-slide-20131001-1900gcce-uapm-slide-20131001-1900
gcce-uapm-slide-20131001-1900
 
distcom-short-20140112-1600
distcom-short-20140112-1600distcom-short-20140112-1600
distcom-short-20140112-1600
 
UNAS-20140123-1800
UNAS-20140123-1800UNAS-20140123-1800
UNAS-20140123-1800
 
Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900Remote-debugging-based-on-notrace32-20130619-1900
Remote-debugging-based-on-notrace32-20130619-1900
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
 

load-balancing-method-for-embedded-rt-system-20120711-0940

  • 1. Load-Balancing for Improving user Responsiveness on Multicore Embedded Systems Jul-11, 2012 Geunsik Lim Samsung Electronics Co., Ltd. Sungkyungkwan University
  • 2. 2/24 Who am I ? • Full name: Geunsik Lim • E-Mail : geunsik.lim@samsung.com, leemgs@gmail.com • Current : Senior software engineer at Samsung Electronics (http://www.samsung.com) • Android localization: Korea Android community (http://www.kandorid.org) • Past: S/W membership manager at Samsung Electronics Senior engineer at ROBOST company Systems administrator at Daegu Bank, Ltd. South Korea Ottawa
  • 3. 3/24 1. Introduction 2. Existing methods 3. Operation zone based load-balancer 4. Evaluation 5. Further work 6. Conclusions TOC
  • 4. 4/24 SMP Scheduler(Load-balancing) : scheduler( ), load_balance( ), migration_thread( ) Synchronization : Semaphore, Spin-Lock, FUTEX, Atomic op., Per-CPU variable, RCU, Work-Queue Interrupt Load-balancing ( or user-space level irqbalance daemon) Affinity (Interface to protect the movement of tasks into another CPU for system administrator) • CPU Affinity (Shielded CPU) • I/O Affinity • IRQ Affinity CPUSET(with Process Container; cgroups): Assign CPU and Memory on NUMA CPU Isolation: Isolate a specific CPU (If you don‟t need Load-balancing) Tasks Tasks Multi-core Parallelism Load-balancing Introduction – Linux Features for Multicore
  • 5. 5/24 • 2.6.00 SMP scalability (Per-CPU data structures) • 2.6.16 SMP IRQ affinity • 2.6.24 CPU isolation • 2.6.28 Block: add support for IO CPU affinity • 2.6.32 Enable rq CPU completion affinity by default (speeds up significantly databases) • 2.6.33 Includes full support for ARM9 MPCore • 2.6.37 Outdated Big Kernel Lock (BKL) technology • 2.6.38 Improve cpu-cgroup performance for smp systems significantly by rewriting tg_shares_up • 2.6.39 Ext4 SMP scalability - SMP speed-ups • 3.1.00 Block: Dynamic writeback throttling - SMP scaling problem fixed , Strict CPU affinity, • 3.4.00 Memory resource controller (with cgroups) Latest Linux have the matured SMP features • 2.6.15: SMP support for ARM11 MPCore • 2.6.18: SMPnice • 2.6.36: Support for S5PV310 (ARM Cortex-A9 Multi-Core) The major features for ARM is merged into mainline Kernel. Change-logs of Linux Kernel for SMP and ARM. Introduction – SMP Linux Up-to-date
  • 6. 6/24 Considerable Problem Solution Avoiding destruction of sharing resource according to Concurrent workers (e.g. Writers) Use Locking mechanism. (e.g: kernel lock facilities, app level thread library) Synchronization overhead Increase or decrease parallel level suitably. Task Migration Adjust Affinity manually. (ideal OS will schedule tasks automatically) Resource Contention Operate well-programmed s/w, well-designed OS scheduler like Cgroups. Utilize sched_yield( ) False sharing Allocated data into cache line size ( via compiler ASAP). Routines used by many agents Implement thread-safe and re-entrant software Cache line depending task migration (e.g. Ping-pong effect) Affinitize tasks to a specific CPU Unfair cache request case. Affinitize tasks to a specific CPU Introduction – Considerable Factors for SMP Environment
  • 7. 7/24 Related work - CPU Affinity Policy This technique affinitize specific tasks into some CPUs to avoid load-balancing operation • Apparatus and method for improved CPU affinity in a multiprocessor systemRA Alfieri - US Patent 5,745,778, 1998, Citations 167 • Affinity scheduling of processes on symmetric multiprocessing systemsKD Abramson, HB Butts Jr… - US Patent 5,506,987, 1996 • Migration policies for multi-core fair-share scheduling, D Choffnes, M Astley,ACM SIGOPS Operating Systems, 2008
  • 8. 8/24 Related work - Classification of RT & NRT tasks This technique isolates a time-critical tasks into a specific CPU physically. • Shielded CPUs: real-time performance in standard Linux, ecee.colorado.edu, S Brosky, Linux Journal, 2004, Citations 11 • Shielded processors: Guaranteeing sub-millisecond response in standard Linux, S Brosky, Parallel and Distributed Processing, 2003 • A real-time Linux, V Yodaiken, Proceedings of the Linux Applications, 1997, Citations 167
  • 9. 9/24 Related work - A Partitioning method for Multi-processor • Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors, S Soltesz, H Pötzl, ME Fiuczynsk, ACM SIGOPS, 2007 , Citations 169 •Task partitioning: An innovation process variable, Eric von Hippel, MIT Sloan School of Management, Cambridge, MA 02139, U.S.A., 1 April 2002. •Process Partitioning for Distributed Embedded Systems, CODES '96 Proceedings of the 4th International Workshop on Hardware/Software Co-Design, 1996 These techniques schedule by grouping/partitioning for tasks‟ goals in kernel space.
  • 10. 10/24 Related work: Load-balancing on Linux for multicore system • Load balancing operation periodically whenever load imbalance for optimal CPU utilization • The problems of this mechanism process task migration unnecessarily although the CPU isn't used as fully as 100%. • Real-time performance and middleware on multi-core linux platforms, Yuanfang Zhang, Washington University, 2008 • Load balancing control method for a loosely coupled multi-processor system and a device for realizing same, Toshio Hirosawa, Hitachi, Japan, Patent No. 4748558, May-1-1986 • Improve load balancing when tasks have large weight differential, Nikhil Rao, Google, http://lwn.net/Articles/409860
  • 11. 11/24 Problems of the existing load-balancer 1. Direct cost • The load-balancing co st by checking the loa d imbalance of CPUs f or utilization and scala bility in the multicore system 2. Indirect cost • Cache invalidation • Power consumption 3. Latency cost • Scheduling latency • Longer non-preempta ble Period In general, more CPU load leads to more frequent task migration, and thus, incurs higher cost. The cost can be broken down into direct, indirect, and latency costs as follows;
  • 12. 12/24 Operation zone based load-balancer: Task migration time Figure shows the time that has to inspect the needs of task migration to keep the CPU load fairly. (1) (2) (3)
  • 13. 13/24 Operation zone based load-balancer : Load-balancing operation zone • load-balancing operation zone consists of three scheduling-aware control areas. • "Cold zone" policy may executes load-balancing operation loosely for low CPU utilization system • "Hot zone" policy must executes load-balancing operation enthusiastically like the existing mechanism • "Warm zone" policy is located in middle level between "Cold zone" and "Hot zone". 100 90 80 70 60 50 40 30 20 10 0 CPUusage(%) Hot Zone Warm Zone Cold Zone Fluctuation Spot (Always load-balancing) (No load-balancing) High spot Mid spot Low spot (No load-balancing)
  • 14. 14/24 Operation zone based load-balancer : Calculating CPU utilization • Warm Zone consists of three spots based on management system of score. • Control of tasks isn't simple because CPU utilization of "Warm zone“ policy occurs fluctuations, Therefore, support Weight-based score management. Please see the paper for the detail Weight-based score management for Warm zone Based on Local CPU (Default policy) Based on Average CPUs
  • 15. 15/24 Hardware Latency Interrupt Per CPU Latency Interrupt Latency Preemption Latency Switching Latency WakeUp Latency Latency Factors in Linux Kernel Misc. Latency Latency factors in kernel-space • The major factors that happen latency damage in kernel-space Scheduling Latency
  • 16. 16/24 16/10 RT Task Go to sleep (1000 usec) NRT/lower PR Tasks 5,000 usec RT Task Go to sleep (1000 usec) NRT/lower PR Tasks 5,000 usec Latency Preemption latency Switching latency Interrupt latency … …Wakeup latency … Evaluation environment
  • 17. 17/24 Evaluation scenario for worst-case # Evaluate latency of 1 user-space thread with static priority 99 # ps -eo comm,pid,tid,class,rtprio,wchan:35 | grep 99 | awk '{print $2}„ time ./cyclictest ( –a 0 )-t1 -p 99 -i 1000 -n -l 1000000 # Create 50 threads as background tasks. time ./cyclictest -t50 -p 80 -i 10000 -n -l 100000 # To maximize I/O Load ASAP cd /opt tar cvzf test1.tgz ./linux-2.6.X & tar cvzf test2.tgz ./linux-2.6.X & tar cvzf test3.tgz ./linux-2.6.X & tar cvzf test4.tgz ./linux-2.6.X & # To maximize CPUs Load /bin/ping -l 100000 -q -s 10 -f localhost & /bin/ping -l 100000 -q -s 10 -f localhost & /bin/ping -l 100000 -q -s 10 -f localhost & /bin/ping -l 100000 -q -s 10 -f localhost & /bin/ping -l 100000 -q -s 10 -f localhost & # To get the highest CPU stress with Ingo Molnar’s dohell. #!/bin/sh while true; do /bin/dd if=/dev/zero of=bigfile bs=1024000 count=1024; done & while true; do /usr/bin/killall hackbench; sleep 5; done & while true; do /sbin/hackbench 20; done & ( cd ./ltp-full-20120401; while true; do ./runalltests.sh -x 40; done & ) Evaluate scheduling Latency of a urgent task Stress conditions http://rt.wiki.kernel.org# Calculate the usage of disk for CPU & I/O load /bin/du / & BACKGROUNDFOREROUND
  • 18. 18/24 Evaluation on CPU affinity based system 1/2 • Test Scenario: Foreground task is affinity (CPU0). Background stress is affinity (CPU1~3). • Test Environment : Intel Q9400 , Linux 2.6.32 • Test Utilities : LTP-FULL-20120401 , Cyclictest of rt-test package • Load-balancer setting: With Warm Zone (High spot) Policy Scheduling latency of our test thread is reduced more than three times: from 53 microseconds to 16 microseconds on average
  • 19. 19/24 Evaluation on CPU non-affinity based system 2/2 • Test Scenario: Foreground task is affinity (CPU0). Background stress is non-affinity. • Test Environment : Intel Q9400 , Linux 2.6.32 • Test Utilities : LTP-FULL-20120401 , cyclictest of rt-test package • Load-balancer setting: With Warm Zone (High spot) Policy Scheduling latency of our test thread is reduced more than two times: from 72 microseconds to 31 microseconds on average
  • 20. 20/24 Performance counter stats for 'sync': 3.837029 task-clock # 0.012 CPUs utilized 13 context-switches# 0.003 M/sec 0 CPU-migrations # 0.000 M/sec 140 page-faults # 0.036 M/sec 9,594,609 cycles# 2.501 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 2,221,867 instructions # 0.23 insns per cycle 404,846 branches # 105.510 M/sec 14,400 branch-misses # 3.56% of all branches 0.321459666 seconds time elapsed sync-2389 [001] 325.763989: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.764012: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764076: wakeup: 2389:120:0 ==+ 394:120:0 [002] sync-2389 [001] 325.764082: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.764089: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764108: wakeup: 2389:120:0 ==+ 2342:120:0 [000] sync-2389 [001] 325.764116: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764134: wakeup: 2389:120:0 ==+ 2343:120:0 [000] sync-2389 [001] 325.764136: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764157: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799064: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799200: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799329: context_switch: 2389:120:2 ==> 0:120:0 [002] sync-2389 [001] 325.799456: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799580: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799661: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.799663: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.917879: wakeup: 2389:120:0 ==+ 394:120:0 [003] . . . . . Below Omission . . . . . . . Evaluation - Task migration of sync command • Test Environment : Android device, Linux 2.6.32 • Test Scenario : Sync (To synchronize files of a storage like micro-sdcard) • Load-balancer policy: With Warm Zone (Mid spot) Policy Performance counter stats for 'sync': 3.837029 task-clock # 0.012 CPUs utilized 13 context-switches# 0.003 M/sec 3 CPU-migrations # 0.005 M/sec 140 page-faults # 0.036 M/sec 9,594,609 cycles# 2.501 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 2,221,867 instructions # 0.23 insns per cycle 404,846 branches # 105.510 M/sec 14,400 branch-misses # 3.56% of all branches 0.321459666 seconds time elapsed sync-2389 [001] 325.763989: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.764012: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764076: wakeup: 2389:120:0 ==+ 394:120:0 [002] sync-2389 [001] 325.764082: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.764089: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764108: wakeup: 2389:120:0 ==+ 2342:120:0 [000] sync-2389 [001] 325.764116: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764134: wakeup: 2389:120:0 ==+ 2343:120:0 [000] sync-2389 [001] 325.764136: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.764157: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799064: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799200: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [002] 325.799329: context_switch: 2389:120:2 ==> 0:120:0 [002] sync-2389 [001] 325.799456: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799580: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.799661: wakeup: 2389:120:0 ==+ 620:120:0 [000] sync-2389 [001] 325.799663: context_switch: 2389:120:2 ==> 0:120:0 [001] sync-2389 [001] 325.917879: wakeup: 2389:120:0 ==+ 394:120:0 [003] . . . . . Below Omission . . . . . . . Tracing with Ftrace Skip the activity of unnecessary task migration for real-time characteristics. Before After
  • 21. 21/24 Evaluation – Migration Handling of one threaded application • Test Environment : Android device, Linux 2.6.32 • Test Scenario : CPU intensive process‟s scheduling with one threaded application • Test Example : tar xvf *** ./ • System Interface: /proc/sys/kernel/balance_one_threaded_app (ON=1, OFF=0) Time CPU 0Before CPU 1 CPU2 CPU 3 95% 94% 89% 91% 86% 92% 97% 91 % 89% 84% Idle status Idle status Idle status Idle status Idle status Idle statusIdle status Idle status Idle status Idle status Idle status Idle status Start End CPU 0 After CPU 1 CPU2 CPU 3 92% (CPU usage of one process) Time Idle status Idle status Idle status Start End
  • 22. 22/24 Further work • If the deadline guarantee for real-time characteristics in the worst conditions is very critical for real-time systems, this approach has the technical limitation to max latency protection of running tasks anytime. • We need to figure out the best method such as a hybrid design by mixing our technique and the physical CPU shielding technique. • To recognize low power consumption of mobile devices, we need further experimental research to design an ideal algorithm for vital task migration according to the CPU on-line and the CPU off-line. • We have to evaluate various scenarios such as direct cost, indirect cost, and latency cost to improve our load-balancer as a next generation SMP scheduler.
  • 23. 23/24 Conclusion • We do not need any modification of user-space because this approach is the only technique in the operating system. • Our design reduces non-preemptive intervals that always generate double-locking cost for task migration among the CPUs. • Our approach suppress the “task migration” kernel thread which executes inefficient CPU instructions to move a task to another CPU • Our idea pushes cost reduction aggressively regarding CPU cache invalidation and synchronization cause by the update of local cache.
  • 24. 24/24 Thank you for your attention! Any questions?