SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
1
Block I/O Layer Tracing:
blktrace
Gelato – Cupertino, CA
April 2006
Alan D. Brunelle
Hewlett­Packard Company
Open Source and Linux Organization
Scalability & Performance Group
Alan.Brunelle@hp.com
2
Introduction
● Blktrace – overview of a new Linux capability
– Ability to see what's going on inside the block I/O 
layer
● “You can't count what you can't measure”
– Kernel implementation
– Description of user applications 
● Sample Output & Analysis
3
Problem Statement
● Need to know the specific operations performed 
upon each I/O submitted into the block I/O layer
● Who?
– Kernel developers in the I/O path:
● Block I/O layer, I/O scheduler, software RAID, file system, ...
– Performance analysis engineers – HP OSLO S&P...
4
Block I/O Layer (simplified)
Applications
File Systems...
Page Cache
Block I/O Layer: Request Queues
Pseudo devices (MD/DM ­ optional)
Physical devices
5
iostat
● The iostat utility does provide information 
pertaining to request queues associated with 
specifics devices
– Average I/O time on queue, number of merges, number of 
blocks read/written, ...
● However, it does not provide detailed information 
on a per­I/O basis
6
Blktrace – to the rescue!
● Developed and maintained by Jens Axboe (block I/O layer 
maintainer)
– My additions included adding threads & utility splitting, DM remap 
events, blkrawverify utility, binary dump feature, testing,  
kernel/utility patches, and documentation.
● Low­overhead, configurable kernel component which emits events 
for specific operations performed on each I/O entering the block 
I/O layer
● Set of tools which extract and format these events
However, blktrace is not an analysis tool!
7
Feature List
● Provides detailed block layer information concerning individual I/Os
● Low­overhead kernel tracing mechanism
– Seeing less than 2% hits to application performance in relatively stressful I/O situations
● Configurable:
– Specify 1 or more physical or logical devices 
(including MD and DM (LVM2))
– User­selectable events – can specify filter at event acquisition and/or 
when formatting output
● Supports both “live” and “playback” tracing
8
Events Captured
● Request queue entry allocated
● Sleep during request queue 
allocation
● Request queue insertion
● Front/back merge of I/O on 
request queue
● Re­queue of a request
● Request issued to underlying 
block dev
● Request queue plug/unplug op
● I/O split/bounce operation
● I/O remap
– MD or DM
● Request completed
9
blktrace: General Architecture
Block I/O Layer
(request queue)
I/O
I/O
I/O
I/O
...
Relay Channel
Relay Channel
Relay Channel
blktrace
blkparse
Per Dev
Per CPU
Kernel Space User Space
Events
Emitted
10
blktrace Utilities
● blktrace: Device configuration, and event 
extraction utility
– Store events in (long) term storage
– Or, pipe to blkparse utility for live tracing
● Also: networking feature to remote events for parsing on another 
machine
● blkparse: Event formatting utility 
– Supports textual or binary dump output
11
blktrace: Event Output
% blktrace -d /dev/sda -o - | blkparse -i -
8,0 3 1 0.000000000 697 G W 223490 + 8 [kjournald]
8,0 3 2 0.000001829 697 P R [kjournald]
8,0 3 3 0.000002197 697 Q W 223490 + 8 [kjournald]
8,0 3 4 0.000005533 697 M W 223498 + 8 [kjournald]
8,0 3 5 0.000008607 697 M W 223506 + 8 [kjournald]
...
8,0 3 10 0.000024062 697 D W 223490 + 56 [kjournald]
8,0 1 11 0.009507758 0 C W 223490 + 56 [0]
Dev <mjr, mnr>
CPU
Sequence
Number Time
Stamp PID Event Start block + number of blocks
Process
12
blktrace: Summary Output
CPU0 (sdao):
Reads Queued: 0, 0KiB Writes Queued: 77,382, 5,865MiB
Read Dispatches: 0, 0KiB Write Dispatches: 7,329, 3,020MiB
Reads Requeued: 0 Writes Requeued: 6
Reads Completed: 0, 0KiB Writes Completed: 0, 0KiB
Read Merges: 0 Write Merges: 68,844
Read depth: 2 Write depth: 65
IO unplugs: 414 Timer unplugs: 414
...
CPU3 (sdao):
Reads Queued: 105, 18KiB Writes Queued: 14,541, 2,578MiB
Read Dispatches: 22, 60KiB Write Dispatches: 6,207, 1,964MiB
Reads Requeued: 0 Writes Requeued: 1,408
Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB
Read Merges: 83 Write Merges: 10,968
Read depth: 2 Write depth: 65
IO unplugs: 287 Timer unplugs: 287
Total (sdao):
Reads Queued: 105, 18KiB Writes Queued: 92,546, 8,579MiB
Read Dispatches: 22, 60KiB Write Dispatches: 13,714, 5,059MiB
Reads Requeued: 0 Writes Requeued: 1,414
Reads Completed: 22, 60KiB Writes Completed: 12,300, 5,059MiB
Read Merges: 83 Write Merges: 80,246
IO unplugs: 718 Timer unplugs: 718
Throughput (R/W): 0KiB/s / 39,806KiB/s
Events (sdao): 324,011 entries
Skips: 0 forward (0 - 0.0%)
Per CPU details
Avg throughput
Per device
details
Writes submitted on
Writes completed on
13
blktrace: Event Storage Choices
● Physical disk backed file system
– Pros: large/permanent amount of storage available; supported by all kernels
– Cons: potentially higher system impact; may negatively impact devices being watched (if 
storing on the same bus that other devices are being watched on...)
● RAM disk backed file system
– Pros: predictable system impact (RAM allocated at boot); less impact to I/O subsystem
– Cons: limited/temporary storage size; removes RAM from system (even when not tracing); 
may require reboot/kernel build
● TMPFS
– Pros: less impact to I/O subsystem; included in most kernels; only utilizes system RAM while 
events are stored
– Cons: limited/temporary storage; impacts system predictability – RAM “removed” as events 
are stored – could affect application being “watched” 
14
blktrace: Analysis Aid
● As noted previously, blktrace does not analyze 
the data; it is responsible for storing and 
formatting events
● Need to develop post­processing analysis tools 
– Can work on formatted output or binary data stored 
by blktrace itself
– Example: btt – block trace timeline
15
Practical blktrace
● Here at HP OSLO S&P, we are investigating I/O 
scalability at various levels
– Including the efficiency of various hardware configurations 
and the effects on I/O performance caused by software RAID 
(MD and DM)
● blktrace enables us to determine scalability issues within 
the block I/O layer and the overhead costs induced when 
utilizing software RAID
16
Life of an I/O (simplified)
● I/O enters block layer – it can be:
– Remapped onto another device (MD, DM)
– Split into 2 separate I/Os (alignment, size, ...)
– Added to the request queue
– Merged with a previous entry on the queue
All I/Os end up on a request queue at some point
● At some later time, the I/O is issued to a device driver, 
and submitted to a device
● Later, the I/O is completed by the device, and its driver
17
btt: Life of an I/O
● Q2I – time it takes to process an I/O prior to it being 
inserted or merged onto a request queue
– Includes split, and remap time
● I2D – time the I/O is “idle” on the request queue
● D2C – time the I/O is “active” in the driver and on the 
device
● Q2I + I2D + D2C = Q2C 
– Q2C: Total processing time of the I/O
18
btt: Partial Output
DEV #Q #D Ratio BLKmin BLKavg BLKmax Total
------- --- ----- ----- ------- ------ ------ --------
[ 8, 0] 92827 12401 7.5 1 109 1024 10120441
[ 8, 1] 93390 13676 6.8 1 108 1024 10150343
[ 8, 2] 92366 13052 7.1 1 109 1024 10119302
[ 8, 3] 92278 13616 6.8 1 109 1024 10119043
[ 8, 4] 92651 13736 6.7 1 109 1024 10119903
DEV Q2I I2D D2C Q2C
------- ----------- ----------- ----------- -----------
[ 8, 0] 0.049697430 0.302734498 0.074038617 0.400079555
[ 8, 1] 0.031665593 0.050032148 0.058669682 0.125934697
[ 8, 2] 0.035651772 0.031035436 0.047311659 0.096735504
[ 8, 3] 0.021047776 0.011161007 0.038519804 0.060975408
[ 8, 4] 0.028985217 0.008397228 0.034344640 0.058160497
DEV Q2I I2D D2C
------- ------ ------ ------
[ 8, 0] 11.7% 71.0% 17.4%
[ 8, 1] 22.6% 35.6% 41.8%
[ 8, 2] 31.3% 27.2% 41.5%
[ 8, 3] 29.8% 15.8% 54.5%
[ 8, 4] 40.4% 11.7% 47.9%
M
erge Ratio:
#Q / #D
SCSI bus, target:
low­ to high­priority
M
erge Ratio:
#Q / #D
Driver/Device time
Avg I/O time
“Software” tim
e
Excessive “idle” time on 
request queue
19
btt: Q&C Activity
● btt also generates “activity” data – indicating 
ranges where processes and devices are actively 
handling various events (block I/O entered, I/O 
inserted/merged, I/O issued, I/O complete, ...)
● This data can be plotted (e.g. xmgrace) to see 
patterns and extract information concerning 
anomalous behavior 
20
btt: I/O Scheduler Example
mkfs & pdflush
“fight” for device
I/O delayed by 
m
kfs activity
“Q” 
Activity
“C” 
Activity
21
btt: I/O Scheduler ­ Explained
● Characterizing I/O stack
● Noticed very long I2D times for certain processes
● Graph shows continuous stream of I/Os...
– ...at the device level
– ...for the mkfs.ext3 process
● Graph shows significant lag for pdflush daemon
– Last I/O enters block I/O layer around 19 seconds
– But: last batch of I/Os aren't completed until 14 seconds later!
● Cause? Anticipatory scheduler – allows mkfs.ext3 to proceed, 
holding off pdflush I/Os
22
Resources
● Kernel sources:
– Patch for Linux 2.6.14­rc3 (or later, up to 2.6.17)
– Linux 2.6.17 (or later) – built in
● Utilities & documentation (& kernel patches)
– rsync://rsync.kernel.org/pub/scm/linux/kernel/git/axboe/blktrace.git
– See documentation in doc directory
● Mailing list: linux­btrace@vger.kernel.org

Contenu connexe

Tendances

03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
Gopi Krishnamurthy
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
Macpaul Lin
 

Tendances (20)

Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Linux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance ShowdownLinux Kernel vs DPDK: HTTP Performance Showdown
Linux Kernel vs DPDK: HTTP Performance Showdown
 
Page cache in Linux kernel
Page cache in Linux kernelPage cache in Linux kernel
Page cache in Linux kernel
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
 
Slab Allocator in Linux Kernel
Slab Allocator in Linux KernelSlab Allocator in Linux Kernel
Slab Allocator in Linux Kernel
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux memory-management-kamal
Linux memory-management-kamalLinux memory-management-kamal
Linux memory-management-kamal
 
Linux IO
Linux IOLinux IO
Linux IO
 
Memory Management with Page Folios
Memory Management with Page FoliosMemory Management with Page Folios
Memory Management with Page Folios
 
Memory model
Memory modelMemory model
Memory model
 
TMUX Rocks!
TMUX Rocks!TMUX Rocks!
TMUX Rocks!
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
 
Process management in linux
Process management in linuxProcess management in linux
Process management in linux
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
U boot porting guide for SoC
U boot porting guide for SoCU boot porting guide for SoC
U boot porting guide for SoC
 

Similaire à Block I/O Layer Tracing: blktrace

Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
Jun Liu
 

Similaire à Block I/O Layer Tracing: blktrace (20)

Case study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File systemCase study of BtrFS: A fault tolerant File system
Case study of BtrFS: A fault tolerant File system
 
Ioppt
IopptIoppt
Ioppt
 
Topic 6 IB DP CS
Topic 6 IB DP CSTopic 6 IB DP CS
Topic 6 IB DP CS
 
2337610
23376102337610
2337610
 
Strata + Hadoop 2015 Slides
Strata + Hadoop 2015 SlidesStrata + Hadoop 2015 Slides
Strata + Hadoop 2015 Slides
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
IO Dubi Lebel
IO Dubi LebelIO Dubi Lebel
IO Dubi Lebel
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Techno-Fest-15nov16
Techno-Fest-15nov16Techno-Fest-15nov16
Techno-Fest-15nov16
 
16aug06.ppt
16aug06.ppt16aug06.ppt
16aug06.ppt
 
My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)My First 100 days with an Exadata (PPT)
My First 100 days with an Exadata (PPT)
 
Performance Whackamole (short version)
Performance Whackamole (short version)Performance Whackamole (short version)
Performance Whackamole (short version)
 
Oct2009
Oct2009Oct2009
Oct2009
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniquesLarson Macaulay apt_malware_past_present_future_out_of_band_techniques
Larson Macaulay apt_malware_past_present_future_out_of_band_techniques
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 
Operating system ppt
Operating system pptOperating system ppt
Operating system ppt
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Dernier (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Block I/O Layer Tracing: blktrace