SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Virtual File System (VFS) in Linux Kernel
Adrian Huang | Nov, 2021
* Based on kernel 5.11 (x86_64) – QEMU
* SMP (4 CPUs) and 8GB memory
* Kernel parameter: nokaslr norandmaps
* Userspace: ASLR is disabled
* Legacy BIOS
Agenda
• Kernel components affected by an IO request
• Object relationship in VFS
• Ext4 file system format
• dentry (Directory Entry) and inode (Index Node)
• Example: ext4
• Anatomy of `mount` system call
• Anatomy of `ls` system call
Kernel components affected by an IO request
VFS
Page Cache/Buffer Cache
(fs/buffer.c)
Disk
filesystem
Disk
filesystem
Block Device
File
Mapping Layer
Generic Block Layer
I/O Scheduler Layer
Block Device Driver
Disk Disk Disk
submit_bio()
open(“/mnt/a.txt”, …) open(“/mnt/b.txt”, …)
Process A
File Object File Object File Object
open file table
dentry dentry
dentry cache
inode inode
inode cache
inode table
Kernel - VFS
dentry in data blocks
Disk
Object relationship in VFS
dentry cache list
• Linked via hlist_bl_{head,node} (hash
table with bit lock)
inode cache list
• Linked via hlist_{head,node} (hash table)
Note
Super
Block
Group
Descriptors
Reserved GDT
blocks
Block
bitmap
inode
bitmap
inode table:
inode descriptor
Data
Block
Data
Block
. . .
0 1
Total inode count
Total block count
Free blocks
Free inodes
…
Block Group 0 . . . Block Group N
Boot Sector
Partition
mode
Owner UID, Owner GID
i_atime (last access time)
i_ctime (last inode change time)
i_mtime (last data modification time)
size
ext4_extent_idx
ei_block
ei_leaf_lo
ei_leaf_hi
ei_unused
ext4_extent_header
eh_magic = 0xF30A
eh_entries
eh_max
eh_depth
eh_generation
ext4_extent_idx
ei_block
ei_leaf_lo
ei_leaf_hi
ei_unused
ext4_extent
ee_block
ee_len
ei_start_hi
ei_start_lo
ext4_extent
ee_block
ee_len
ei_start_hi
ei_start_lo
. . .
. . .
root
index node leaf node Disk blocks
Extent Tree
Ext4 file system format
Storage size addressed by a block group:
4096 (bytes) * 8 * 4096 (block size) = 128MB
Super
Block
Group
Descriptors
Reserved GDT
blocks
Block
bitmap
inode
bitmap
inode table:
inode descriptor
Data
Block
Data
Block
. . .
0 1
Total inode count
Total block count
Free blocks
Free inodes
…
Block Group 0 . . . Block Group N
Boot Sector
Partition
mode
Owner UID, Owner GID
i_atime (last access time)
i_ctime (last inode change time)
i_mtime (last data modification time)
size
ext4_extent_idx
ei_block
ei_leaf_lo
ei_leaf_hi
ei_unused
ext4_extent_header
eh_magic = 0xF30A
eh_entries
eh_max
eh_depth
eh_generation
ext4_extent_idx
ei_block
ei_leaf_lo
ei_leaf_hi
ei_unused
ext4_extent
ee_block
ee_len
ei_start_hi
ei_start_lo
ext4_extent
ee_block
ee_len
ei_start_hi
ei_start_lo
. . .
. . .
root
index node leaf node Disk blocks
Extent Tree
Ext4 file system format
inode# Purpose
0 Doesn't exist; there is no inode 0
1 List of defective blocks
2 Root directory
3 User quota
4 Group quota
5 Boot loader
6 Undelete directory
7 Reserved group descriptors inode. ("resize inode")
8 Journal inode
9 The "exclude" inode
10 Replica inode
11 First non-reserved inode. Usually this is the lost+found
directory. See s_first_ino in the superblock.
Relationship between inode and directory entry (dentry)
inode number link count
inode bitmap: 19 * 4096 = 77824 (0x13000)
inode 1-16 inode 17
`dumpe2fs 16MiB.img`
Relationship between inode and directory entry (dentry)
inode entry: 35 * 4096 + 128 * (2 – 1) [inode #2] = 0x23080
`dumpe2fs 16MiB.img`
inode structure data of inode #2
Relationship between inode and directory entry (dentry)
mode
Owner UID, Owner GID
i_atime
i_ctime
i_mtime
size
ext4_extent_header
eh_magic = 0xF30A
eh_entries = 1
eh_max = 4
eh_depth = 0
eh_generation = 0
root
Disk blocks
ext4_extent
ee_block = 0
ee_len = 0x0001
ei_start_hi = 0x0000
ei_start_lo = 0x00000004
Relationship between inode and dentry – root (inode #2)
ext4_inode (inode #2)
leaf node
. 2
name inode
Directory Entry (dentry)
Block #4: A list of struct ext4_dir_entry_2
.. 2 -> 2646185
lost+found 11
a123 12
a789 17
Block #0-3
Block #5
inode structure data of inode #2 Root Directory
dentry in data block of inode #2
type = directory
i_links_count = 5
Relationship between inode and dentry – ‘b456’ (inode #13)
i-node table
. 2
name inode
Directory Entry (dentry) - /
.. 2 -> 2646185
lost+found 11
a123 12
a789 17
2
12
. 12
name inode
.. 2
b456 13
hello.c 15
hello-hard-link.c 15
Directory Entry (dentry) - /a123
13
. 13
name inode
.. 12
test.c 16
test-hard-link.c 16
test-soft-link.c 14
Directory Entry (dentry) - /a123/b456
14
16
Data block pointers
type = directory
i_links_count = 3
Data block pointers
type = directory
i_links_count = 2
Data block pointers
type = symbolic link
i_links_count = 1
Data block pointers
= “test.c”
type = file
i_links_count = 2
Data block pointers
File Data
/a123/b456/test.c
size = 6
size = 100
inode #
type = directory
i_links_count = 5
Relationship between inode and dentry – ‘test.c’ (inode #16)
i-node table
. 2
name inode
Directory Entry (dentry) - /
.. 2 -> 2646185
lost+found 11
a123 12
a789 17
2
12
. 12
name inode
.. 2
b456 13
hello.c 15
hello-hard-link.c 15
Directory Entry (dentry) - /a123
13
. 13
name inode
.. 12
test.c 16
test-hard-link.c 16
test-soft-link.c 14
Directory Entry (dentry) - /a123/b456
14
16
Data block pointers
type = directory
i_links_count = 3
Data block pointers
type = directory
i_links_count = 2
Data block pointers
type = symbolic link
i_links_count = 1
Data block pointers
= “test.c”
type = file
i_links_count = 2
Data block pointers
File Data
/a123/b456/test.c
size = 6
size = 100
inode structure data of inode #16
inode entry: 35 * 4096 + 128 * (16 – 1) [inode #16] = 0x23780
address inode structure of inode #16
Data block pointer represented by
struct ext4_extent
i_links_count
inode #
type = directory
i_links_count = 5
Relationship between inode and dentry – ‘test-soft-link.c’ (inode #14)
i-node table
. 2
name inode
Directory Entry (dentry) - /
.. 2 -> 2646185
lost+found 11
a123 12
a789 17
2
12
. 12
name inode
.. 2
b456 13
hello.c 15
hello-hard-link.c 15
Directory Entry (dentry) - /a123
13
. 13
name inode
.. 12
test.c 16
test-hard-link.c 16
test-soft-link.c 14
Directory Entry (dentry) - /a123/b456
14
16
Data block pointers
type = directory
i_links_count = 3
Data block pointers
type = directory
i_links_count = 2
Data block pointers
type = symbolic link
i_links_count = 1
Data block pointers
= “test.c”
type = file
i_links_count = 2
Data block pointers
File Data
/a123/b456/test.c
size = 6
size = 100
inode structure data of inode #14
inode entry: 35 * 4096 + 128 * (14 – 1) [inode #14] = 0x23680
address inode structure of inode #14
i_links_count
Data block pointer
Hard Link
1. Refer to an inode number
2. Must reside on the same file system
3. Cannot be made to a directory
Soft Link
1. Refer to a filename instead of an inode number
2. Can be linked to a file in a different file system
3. Can be made to a directory
inode #
mount – mount a new ext4 file system
• stat() – get file status from inode
o Name resolution → get dentry → get inode
• openat()
o Name resolution
o Update open file table
• mount()
o Name resolution
o Add a mount entry to ‘mount’ hash table
o Parse file system
stat("/adrian/img/src.img“, …)
user_path_at(): fill a path struct (dentry and mount info) after resolving a pathname
vfsmount
mnt_root
mnt_sb
dentry
d_name
path
mnt
dentry
d_inode
path struct
• stat() – get file status from inode
o Name resolution → get dentry → get inode
stat("/adrian/img/src.img“, …)
1. namei: convert a “name” to an “inode”
2. nameidata: store the current status when walking a path
nameidata
last
name
root
inode
path
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
stat("/adrian/img/src.img“, …)
path_lookupat():
1. Configure members of nameidata struct based on absolute pathname or relative pathname
2. Name resolution
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
path
mnt
dentry
current->fs->root
d_inode
* Data storage (not a data pointer)
Note
link_path_walk(): name resolution – round 1
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
* Data storage (not a data pointer)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
path
mnt
dentry
current->fs->root
d_inode
qstr
name =
“adrian/img/src.img”
u32 hash
u32 len = 6
u64 hash_len
union
variable ‘name’ = “adrian/img/src.img”
dentry
d_name
d_inode
path
mnt
dentry
variable ‘name’ = “img/src.img”
hash
1
2
3
4
5
6
Note
local variable
link_path_walk()
• stop until the last dentry
is resolved
• for-loop in this function
qstr
name = “adrian”
u32 hash
u32 len = 6
u64 hash_len
union
link_path_walk(): name resolution – round 1
nameidata
last (S*)
name
root (S*)
inode
path (S*)
* Data storage (not a data pointer)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
path
mnt
dentry
current->fs->root
qstr
name =
“adrian/img/src.img”
u32 hash
u32 len = 6
u64 hash_len
union
variable ‘name’ = “adrian/img/src.img”
dentry
d_name
d_inode
path
mnt
dentry
variable ‘name’ = “img/src.img”
hash
1
2
3
4
5
6
Note
local variable
lookup_slow()
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
* Data storage (not a data pointer)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
path
mnt
dentry
current->fs->root
d_inode
qstr
name =
“img/src.img”
u32 hash
u32 len = 3
u64 hash_len
union
variable ‘name’ = “img/src.img”
dentry
d_name
d_inode
path
mnt
dentry
variable ‘name’ = “src.img”
hash
1
2
3
4
5
6
Note
link_path_walk(): name resolution – round 2
qstr
name = “img”
u32 hash
u32 len = 3
u64 hash_len
union
link_path_walk()
• stop until the last dentry
is resolved
• for-loop in this function
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
* Data storage (not a data pointer)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
path
mnt
dentry
current->fs->root
d_inode
qstr
name = “src.img”
u32 hash
u32 len = 3
u64 hash_len
union
variable ‘name’ = “src.img”
dentry
d_name
d_inode
path
mnt
dentry
variable ‘name’ = “”
2
Note
Function return
link_path_walk(): name resolution – round 3
1
hash
qstr
name = “img”
u32 hash
u32 len = 3
u64 hash_len
union
link_path_walk()
• stop until the last dentry
is resolved
• for-loop in this function
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
path
mnt
dentry
current->fs->root
d_inode
qstr
name = “src.img”
u32 hash
u32 len = 6
u64 hash_len
union
variable ‘name’ = “src.img”
dentry
d_name
qstr
name = “src.img”
u32 hash
u32 len = 6
u64 hash_len
union
d_inode
path
mnt
dentry
variable ‘name’ = “”
3
4
5
6
Function return
lookup_last(): name resolution
hash
1
2
* Data storage (not a data pointer)
Note
lookup_last()
• get the corresponding dentry
based on the filename
• “while loop”: Follow the
symlink in the final
component
vfsmount
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len
u64 hash_len
union
nameidata
last (S*)
name
root (S*)
inode
path (S*)
task_struct
mm
nameidata
filename
name
uptr (userland pointer)
refcnt
aname
char iname[] =
“/adrian/img/src.img"
char __user *name =
“/adrian/img/src.img”
local variable
path
mnt
dentry
current->fs->root
d_inode
qstr
name = “src.img”
u32 hash
u32 len = 6
u64 hash_len
union
variable ‘name’ = “src.img”
dentry
d_name
qstr
name = “src.img”
u32 hash
u32 len = 6
u64 hash_len
union
d_inode
path
mnt
dentry
variable ‘name’ = “”
3
4
5
6
Function return
lookup_last(): name resolution
hash
1
2
* Data storage (not a data pointer)
Note
mount – mount a new ext4 file system
hlist_head.first
hlist_head.first
.
.
mount_hashtable mount
mnt_parent
struct dentry
*mnt_mountpoint
struct vfsmount mnt
**pprev *next
const char *mnt_devname =
“none”
mount
mnt_parent
struct dentry
*mnt_mountpoint
**pprev
struct hlist_node mnt_hash
*next
const char *mnt_devname =
“none”
struct hlist_node mnt_hash
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ramfs_sops
s_root
s_bdev
s_id[32] = “rootfs”
s_fs_info
s_type = rootfs_fs_type
d_parent
mnt_root
mnt_sb
struct vfsmount mnt
mount – before mounting a new ext4 file system
d_sb
hlist_head.first
hlist_head.first
.
.
mount_hashtable mount
mnt_parent
struct dentry
*mnt_mountpoint
struct vfsmount mnt
**pprev *next
const char *mnt_devname =
“none”
mount
mnt_parent
struct dentry
*mnt_mountpoint
**pprev
struct hlist_node mnt_hash
*next
const char *mnt_devname =
“none”
struct hlist_node mnt_hash
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ramfs_sops
s_root
s_bdev
s_id[32] = “rootfs”
s_fs_info
s_type = rootfs_fs_type
d_parent
mnt_root
mnt_sb
struct vfsmount mnt
mount – after mounting a new ext4 file system
mount
mnt_parent
struct dentry
*mnt_mountpoint
**pprev *next
const char *mnt_devname =
“/dev/loop0”
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
struct vfsmount mnt
super_block
s_op = ext4_sops
s_id[32] = “loop0”
s_type = ext4_fs_type
d_parent
struct hlist_node mnt_hash
dentry
d_name
d_parent
qstr
name = “mnt”
u32 hash
u32 len = 3
u64 hash_len
union
/adrian/mnt
d_sb
d_sb
hlist_head.first
hlist_head.first
.
.
mount_hashtable mount
mnt_parent
struct dentry
*mnt_mountpoint
struct vfsmount mnt
**pprev *next
const char *mnt_devname =
“none”
mount
mnt_parent
struct dentry
*mnt_mountpoint
**pprev
struct hlist_node mnt_hash
*next
const char *mnt_devname =
“none”
struct hlist_node mnt_hash
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ramfs_sops
s_root
s_bdev
s_id[32] = “rootfs”
s_fs_info
s_type = rootfs_fs_type
d_parent
mnt_root
mnt_sb
struct vfsmount mnt
mount – after mounting a new ext4 file system
mount
mnt_parent
struct dentry
*mnt_mountpoint
**pprev *next
const char *mnt_devname =
“/dev/loop0”
mnt_root
mnt_sb
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
struct vfsmount mnt
super_block
s_op = ext4_sops
s_id[32] = “loop0”
s_type = ext4_fs_type
d_parent
struct hlist_node mnt_hash
dentry
d_name
d_parent
qstr
name = “mnt”
u32 hash
u32 len = 3
u64 hash_len
union
/adrian/mnt
d_sb
d_sb
`ls –color mnt`
`ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
`ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
`ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
path
mnt
dentry
vfsmount
mnt_root
mnt_sb
qstr
name = “mnt”
u32 hash
u32 len = 3
u64 hash_len
union
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ramfs_ops
s_root
s_bdev
s_id[32] = “rootfs”
s_fs_info
d_inode
dentry
d_name
d_inode
d_parent
s_type = rootfs_fs_type
path
mnt
dentry
vfsmount
mnt_root
mnt_sb
qstr
name = “mnt /”
u32 hash
u32 len = 3 1
u64 hash_len
union
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ext4_sops
s_root
s_bdev
s_id[32] = “loop0”
s_fs_info
d_inode
dentry
d_name
d_inode
d_parent
s_type = ext4_fs_type
/adrian/mnt data structures in the original file system /adrian/mnt data structures in newly mounted file system
traverse_mounts()
`ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
The original mount point is changed the newly one when running name resolution
path
mnt
dentry
vfsmount
mnt_root
mnt_sb
qstr
name = “mnt /”
u32 hash
u32 len = 3 1
u64 hash_len
union
dentry
d_name
qstr
name = “/”
u32 hash
u32 len = 1
u64 hash_len
union
super_block
s_blocksize
s_maxbytes
s_op = ext4_sops
s_root
s_bdev
s_id[32] = “loop0”
s_fs_info
d_inode
dentry
d_name
d_inode
d_parent
s_type = ext4_fs_type
/adrian/mnt data structures in newly mounted file system
flags: path->dentry->d_flags
`ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
REF-walk
[Two concurrent mechanisms for name resolution]
1. REF-walk: concurrent management with refcounts and spinlock
2. RCU-walk: faster name resolution lookup

Contenu connexe

Tendances

Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelAdrian Huang
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdfAdrian Huang
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Adrian Huang
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequenceHoucheng Lin
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Adrian Huang
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdfAdrian Huang
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory ManagementNi Zo-Ma
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernelVadim Nikitin
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Adrian Huang
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionGene Chang
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelDivye Kapoor
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machineAlexei Starovoitov
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceBrendan Gregg
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver艾鍗科技
 

Tendances (20)

Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
Memory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux KernelMemory Mapping Implementation (mmap) in Linux Kernel
Memory Mapping Implementation (mmap) in Linux Kernel
 
Physical Memory Management.pdf
Physical Memory Management.pdfPhysical Memory Management.pdf
Physical Memory Management.pdf
 
Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...Decompressed vmlinux: linux kernel initialization from page table configurati...
Decompressed vmlinux: linux kernel initialization from page table configurati...
 
Uboot startup sequence
Uboot startup sequenceUboot startup sequence
Uboot startup sequence
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
Physical Memory Models.pdf
Physical Memory Models.pdfPhysical Memory Models.pdf
Physical Memory Models.pdf
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Memory management in Linux kernel
Memory management in Linux kernelMemory management in Linux kernel
Memory management in Linux kernel
 
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareHKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
 
Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)Anatomy of the loadable kernel module (lkm)
Anatomy of the loadable kernel module (lkm)
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver
 
Hands-on ethernet driver
Hands-on ethernet driverHands-on ethernet driver
Hands-on ethernet driver
 
Network Drivers
Network DriversNetwork Drivers
Network Drivers
 

Similaire à Linux Kernel - Virtual File System

Internal representation of files ppt
Internal representation of files pptInternal representation of files ppt
Internal representation of files pptAbhaysinh Surve
 
file handling c++
file handling c++file handling c++
file handling c++Guddu Spy
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems senthilamul
 
File and directories in python
File and directories in pythonFile and directories in python
File and directories in pythonLifna C.S
 
Advfs 3 in-memory structures
Advfs 3 in-memory structuresAdvfs 3 in-memory structures
Advfs 3 in-memory structuresJustin Goldberg
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System iChuck Walker
 
MacOS forensics and anti-forensics (DC Lviv 2019) presentation
MacOS forensics and anti-forensics (DC Lviv 2019) presentationMacOS forensics and anti-forensics (DC Lviv 2019) presentation
MacOS forensics and anti-forensics (DC Lviv 2019) presentationOlehLevytskyi1
 
Java File I/O Performance Analysis - Part I - JCConf 2018
Java File I/O Performance Analysis - Part I - JCConf 2018Java File I/O Performance Analysis - Part I - JCConf 2018
Java File I/O Performance Analysis - Part I - JCConf 2018Michael Fong
 
55 new things in Java 7 - Devoxx France
55 new things in Java 7 - Devoxx France55 new things in Java 7 - Devoxx France
55 new things in Java 7 - Devoxx FranceDavid Delabassee
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfycelgemici1
 
Oracle forensics 101
Oracle forensics 101Oracle forensics 101
Oracle forensics 101fangjiafu
 
Confraria SECURITY & IT - Lisbon Set 29, 2011
Confraria SECURITY & IT - Lisbon Set 29, 2011Confraria SECURITY & IT - Lisbon Set 29, 2011
Confraria SECURITY & IT - Lisbon Set 29, 2011ricardomcm
 
Char Drivers And Debugging Techniques
Char Drivers And Debugging TechniquesChar Drivers And Debugging Techniques
Char Drivers And Debugging TechniquesYourHelper1
 

Similaire à Linux Kernel - Virtual File System (20)

Internal representation of files ppt
Internal representation of files pptInternal representation of files ppt
Internal representation of files ppt
 
Vfs
VfsVfs
Vfs
 
file handling c++
file handling c++file handling c++
file handling c++
 
Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems Unix file systems 2 in unix internal systems
Unix file systems 2 in unix internal systems
 
File and directories in python
File and directories in pythonFile and directories in python
File and directories in python
 
Linux
LinuxLinux
Linux
 
File management
File managementFile management
File management
 
DFSNov1.pptx
DFSNov1.pptxDFSNov1.pptx
DFSNov1.pptx
 
Advfs 3 in-memory structures
Advfs 3 in-memory structuresAdvfs 3 in-memory structures
Advfs 3 in-memory structures
 
Working with the IFS on System i
Working with the IFS on System iWorking with the IFS on System i
Working with the IFS on System i
 
MacOS forensics and anti-forensics (DC Lviv 2019) presentation
MacOS forensics and anti-forensics (DC Lviv 2019) presentationMacOS forensics and anti-forensics (DC Lviv 2019) presentation
MacOS forensics and anti-forensics (DC Lviv 2019) presentation
 
Java File I/O Performance Analysis - Part I - JCConf 2018
Java File I/O Performance Analysis - Part I - JCConf 2018Java File I/O Performance Analysis - Part I - JCConf 2018
Java File I/O Performance Analysis - Part I - JCConf 2018
 
55 new things in Java 7 - Devoxx France
55 new things in Java 7 - Devoxx France55 new things in Java 7 - Devoxx France
55 new things in Java 7 - Devoxx France
 
17 files and streams
17 files and streams17 files and streams
17 files and streams
 
Linux.pdf
Linux.pdfLinux.pdf
Linux.pdf
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
 
Oracle forensics 101
Oracle forensics 101Oracle forensics 101
Oracle forensics 101
 
Confraria SECURITY & IT - Lisbon Set 29, 2011
Confraria SECURITY & IT - Lisbon Set 29, 2011Confraria SECURITY & IT - Lisbon Set 29, 2011
Confraria SECURITY & IT - Lisbon Set 29, 2011
 
C++ files and streams
C++ files and streamsC++ files and streams
C++ files and streams
 
Char Drivers And Debugging Techniques
Char Drivers And Debugging TechniquesChar Drivers And Debugging Techniques
Char Drivers And Debugging Techniques
 

Dernier

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 

Dernier (20)

Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 

Linux Kernel - Virtual File System

  • 1. Virtual File System (VFS) in Linux Kernel Adrian Huang | Nov, 2021 * Based on kernel 5.11 (x86_64) – QEMU * SMP (4 CPUs) and 8GB memory * Kernel parameter: nokaslr norandmaps * Userspace: ASLR is disabled * Legacy BIOS
  • 2. Agenda • Kernel components affected by an IO request • Object relationship in VFS • Ext4 file system format • dentry (Directory Entry) and inode (Index Node) • Example: ext4 • Anatomy of `mount` system call • Anatomy of `ls` system call
  • 3. Kernel components affected by an IO request VFS Page Cache/Buffer Cache (fs/buffer.c) Disk filesystem Disk filesystem Block Device File Mapping Layer Generic Block Layer I/O Scheduler Layer Block Device Driver Disk Disk Disk submit_bio()
  • 4. open(“/mnt/a.txt”, …) open(“/mnt/b.txt”, …) Process A File Object File Object File Object open file table dentry dentry dentry cache inode inode inode cache inode table Kernel - VFS dentry in data blocks Disk Object relationship in VFS dentry cache list • Linked via hlist_bl_{head,node} (hash table with bit lock) inode cache list • Linked via hlist_{head,node} (hash table) Note
  • 5. Super Block Group Descriptors Reserved GDT blocks Block bitmap inode bitmap inode table: inode descriptor Data Block Data Block . . . 0 1 Total inode count Total block count Free blocks Free inodes … Block Group 0 . . . Block Group N Boot Sector Partition mode Owner UID, Owner GID i_atime (last access time) i_ctime (last inode change time) i_mtime (last data modification time) size ext4_extent_idx ei_block ei_leaf_lo ei_leaf_hi ei_unused ext4_extent_header eh_magic = 0xF30A eh_entries eh_max eh_depth eh_generation ext4_extent_idx ei_block ei_leaf_lo ei_leaf_hi ei_unused ext4_extent ee_block ee_len ei_start_hi ei_start_lo ext4_extent ee_block ee_len ei_start_hi ei_start_lo . . . . . . root index node leaf node Disk blocks Extent Tree Ext4 file system format Storage size addressed by a block group: 4096 (bytes) * 8 * 4096 (block size) = 128MB
  • 6. Super Block Group Descriptors Reserved GDT blocks Block bitmap inode bitmap inode table: inode descriptor Data Block Data Block . . . 0 1 Total inode count Total block count Free blocks Free inodes … Block Group 0 . . . Block Group N Boot Sector Partition mode Owner UID, Owner GID i_atime (last access time) i_ctime (last inode change time) i_mtime (last data modification time) size ext4_extent_idx ei_block ei_leaf_lo ei_leaf_hi ei_unused ext4_extent_header eh_magic = 0xF30A eh_entries eh_max eh_depth eh_generation ext4_extent_idx ei_block ei_leaf_lo ei_leaf_hi ei_unused ext4_extent ee_block ee_len ei_start_hi ei_start_lo ext4_extent ee_block ee_len ei_start_hi ei_start_lo . . . . . . root index node leaf node Disk blocks Extent Tree Ext4 file system format
  • 7. inode# Purpose 0 Doesn't exist; there is no inode 0 1 List of defective blocks 2 Root directory 3 User quota 4 Group quota 5 Boot loader 6 Undelete directory 7 Reserved group descriptors inode. ("resize inode") 8 Journal inode 9 The "exclude" inode 10 Replica inode 11 First non-reserved inode. Usually this is the lost+found directory. See s_first_ino in the superblock. Relationship between inode and directory entry (dentry) inode number link count
  • 8. inode bitmap: 19 * 4096 = 77824 (0x13000) inode 1-16 inode 17 `dumpe2fs 16MiB.img` Relationship between inode and directory entry (dentry)
  • 9. inode entry: 35 * 4096 + 128 * (2 – 1) [inode #2] = 0x23080 `dumpe2fs 16MiB.img` inode structure data of inode #2 Relationship between inode and directory entry (dentry)
  • 10. mode Owner UID, Owner GID i_atime i_ctime i_mtime size ext4_extent_header eh_magic = 0xF30A eh_entries = 1 eh_max = 4 eh_depth = 0 eh_generation = 0 root Disk blocks ext4_extent ee_block = 0 ee_len = 0x0001 ei_start_hi = 0x0000 ei_start_lo = 0x00000004 Relationship between inode and dentry – root (inode #2) ext4_inode (inode #2) leaf node . 2 name inode Directory Entry (dentry) Block #4: A list of struct ext4_dir_entry_2 .. 2 -> 2646185 lost+found 11 a123 12 a789 17 Block #0-3 Block #5 inode structure data of inode #2 Root Directory dentry in data block of inode #2
  • 11. type = directory i_links_count = 5 Relationship between inode and dentry – ‘b456’ (inode #13) i-node table . 2 name inode Directory Entry (dentry) - / .. 2 -> 2646185 lost+found 11 a123 12 a789 17 2 12 . 12 name inode .. 2 b456 13 hello.c 15 hello-hard-link.c 15 Directory Entry (dentry) - /a123 13 . 13 name inode .. 12 test.c 16 test-hard-link.c 16 test-soft-link.c 14 Directory Entry (dentry) - /a123/b456 14 16 Data block pointers type = directory i_links_count = 3 Data block pointers type = directory i_links_count = 2 Data block pointers type = symbolic link i_links_count = 1 Data block pointers = “test.c” type = file i_links_count = 2 Data block pointers File Data /a123/b456/test.c size = 6 size = 100 inode #
  • 12. type = directory i_links_count = 5 Relationship between inode and dentry – ‘test.c’ (inode #16) i-node table . 2 name inode Directory Entry (dentry) - / .. 2 -> 2646185 lost+found 11 a123 12 a789 17 2 12 . 12 name inode .. 2 b456 13 hello.c 15 hello-hard-link.c 15 Directory Entry (dentry) - /a123 13 . 13 name inode .. 12 test.c 16 test-hard-link.c 16 test-soft-link.c 14 Directory Entry (dentry) - /a123/b456 14 16 Data block pointers type = directory i_links_count = 3 Data block pointers type = directory i_links_count = 2 Data block pointers type = symbolic link i_links_count = 1 Data block pointers = “test.c” type = file i_links_count = 2 Data block pointers File Data /a123/b456/test.c size = 6 size = 100 inode structure data of inode #16 inode entry: 35 * 4096 + 128 * (16 – 1) [inode #16] = 0x23780 address inode structure of inode #16 Data block pointer represented by struct ext4_extent i_links_count inode #
  • 13. type = directory i_links_count = 5 Relationship between inode and dentry – ‘test-soft-link.c’ (inode #14) i-node table . 2 name inode Directory Entry (dentry) - / .. 2 -> 2646185 lost+found 11 a123 12 a789 17 2 12 . 12 name inode .. 2 b456 13 hello.c 15 hello-hard-link.c 15 Directory Entry (dentry) - /a123 13 . 13 name inode .. 12 test.c 16 test-hard-link.c 16 test-soft-link.c 14 Directory Entry (dentry) - /a123/b456 14 16 Data block pointers type = directory i_links_count = 3 Data block pointers type = directory i_links_count = 2 Data block pointers type = symbolic link i_links_count = 1 Data block pointers = “test.c” type = file i_links_count = 2 Data block pointers File Data /a123/b456/test.c size = 6 size = 100 inode structure data of inode #14 inode entry: 35 * 4096 + 128 * (14 – 1) [inode #14] = 0x23680 address inode structure of inode #14 i_links_count Data block pointer Hard Link 1. Refer to an inode number 2. Must reside on the same file system 3. Cannot be made to a directory Soft Link 1. Refer to a filename instead of an inode number 2. Can be linked to a file in a different file system 3. Can be made to a directory inode #
  • 14. mount – mount a new ext4 file system • stat() – get file status from inode o Name resolution → get dentry → get inode • openat() o Name resolution o Update open file table • mount() o Name resolution o Add a mount entry to ‘mount’ hash table o Parse file system
  • 15. stat("/adrian/img/src.img“, …) user_path_at(): fill a path struct (dentry and mount info) after resolving a pathname vfsmount mnt_root mnt_sb dentry d_name path mnt dentry d_inode path struct • stat() – get file status from inode o Name resolution → get dentry → get inode
  • 16. stat("/adrian/img/src.img“, …) 1. namei: convert a “name” to an “inode” 2. nameidata: store the current status when walking a path nameidata last name root inode path task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable
  • 17. stat("/adrian/img/src.img“, …) path_lookupat(): 1. Configure members of nameidata struct based on absolute pathname or relative pathname 2. Name resolution vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable path mnt dentry current->fs->root d_inode * Data storage (not a data pointer) Note
  • 18. link_path_walk(): name resolution – round 1 vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) * Data storage (not a data pointer) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” path mnt dentry current->fs->root d_inode qstr name = “adrian/img/src.img” u32 hash u32 len = 6 u64 hash_len union variable ‘name’ = “adrian/img/src.img” dentry d_name d_inode path mnt dentry variable ‘name’ = “img/src.img” hash 1 2 3 4 5 6 Note local variable link_path_walk() • stop until the last dentry is resolved • for-loop in this function qstr name = “adrian” u32 hash u32 len = 6 u64 hash_len union
  • 19. link_path_walk(): name resolution – round 1 nameidata last (S*) name root (S*) inode path (S*) * Data storage (not a data pointer) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” path mnt dentry current->fs->root qstr name = “adrian/img/src.img” u32 hash u32 len = 6 u64 hash_len union variable ‘name’ = “adrian/img/src.img” dentry d_name d_inode path mnt dentry variable ‘name’ = “img/src.img” hash 1 2 3 4 5 6 Note local variable lookup_slow()
  • 20. vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) * Data storage (not a data pointer) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable path mnt dentry current->fs->root d_inode qstr name = “img/src.img” u32 hash u32 len = 3 u64 hash_len union variable ‘name’ = “img/src.img” dentry d_name d_inode path mnt dentry variable ‘name’ = “src.img” hash 1 2 3 4 5 6 Note link_path_walk(): name resolution – round 2 qstr name = “img” u32 hash u32 len = 3 u64 hash_len union link_path_walk() • stop until the last dentry is resolved • for-loop in this function
  • 21. vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) * Data storage (not a data pointer) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable path mnt dentry current->fs->root d_inode qstr name = “src.img” u32 hash u32 len = 3 u64 hash_len union variable ‘name’ = “src.img” dentry d_name d_inode path mnt dentry variable ‘name’ = “” 2 Note Function return link_path_walk(): name resolution – round 3 1 hash qstr name = “img” u32 hash u32 len = 3 u64 hash_len union link_path_walk() • stop until the last dentry is resolved • for-loop in this function
  • 22. vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable path mnt dentry current->fs->root d_inode qstr name = “src.img” u32 hash u32 len = 6 u64 hash_len union variable ‘name’ = “src.img” dentry d_name qstr name = “src.img” u32 hash u32 len = 6 u64 hash_len union d_inode path mnt dentry variable ‘name’ = “” 3 4 5 6 Function return lookup_last(): name resolution hash 1 2 * Data storage (not a data pointer) Note lookup_last() • get the corresponding dentry based on the filename • “while loop”: Follow the symlink in the final component
  • 23. vfsmount mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len u64 hash_len union nameidata last (S*) name root (S*) inode path (S*) task_struct mm nameidata filename name uptr (userland pointer) refcnt aname char iname[] = “/adrian/img/src.img" char __user *name = “/adrian/img/src.img” local variable path mnt dentry current->fs->root d_inode qstr name = “src.img” u32 hash u32 len = 6 u64 hash_len union variable ‘name’ = “src.img” dentry d_name qstr name = “src.img” u32 hash u32 len = 6 u64 hash_len union d_inode path mnt dentry variable ‘name’ = “” 3 4 5 6 Function return lookup_last(): name resolution hash 1 2 * Data storage (not a data pointer) Note
  • 24. mount – mount a new ext4 file system
  • 25. hlist_head.first hlist_head.first . . mount_hashtable mount mnt_parent struct dentry *mnt_mountpoint struct vfsmount mnt **pprev *next const char *mnt_devname = “none” mount mnt_parent struct dentry *mnt_mountpoint **pprev struct hlist_node mnt_hash *next const char *mnt_devname = “none” struct hlist_node mnt_hash dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ramfs_sops s_root s_bdev s_id[32] = “rootfs” s_fs_info s_type = rootfs_fs_type d_parent mnt_root mnt_sb struct vfsmount mnt mount – before mounting a new ext4 file system d_sb
  • 26. hlist_head.first hlist_head.first . . mount_hashtable mount mnt_parent struct dentry *mnt_mountpoint struct vfsmount mnt **pprev *next const char *mnt_devname = “none” mount mnt_parent struct dentry *mnt_mountpoint **pprev struct hlist_node mnt_hash *next const char *mnt_devname = “none” struct hlist_node mnt_hash dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ramfs_sops s_root s_bdev s_id[32] = “rootfs” s_fs_info s_type = rootfs_fs_type d_parent mnt_root mnt_sb struct vfsmount mnt mount – after mounting a new ext4 file system mount mnt_parent struct dentry *mnt_mountpoint **pprev *next const char *mnt_devname = “/dev/loop0” mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union struct vfsmount mnt super_block s_op = ext4_sops s_id[32] = “loop0” s_type = ext4_fs_type d_parent struct hlist_node mnt_hash dentry d_name d_parent qstr name = “mnt” u32 hash u32 len = 3 u64 hash_len union /adrian/mnt d_sb d_sb
  • 27. hlist_head.first hlist_head.first . . mount_hashtable mount mnt_parent struct dentry *mnt_mountpoint struct vfsmount mnt **pprev *next const char *mnt_devname = “none” mount mnt_parent struct dentry *mnt_mountpoint **pprev struct hlist_node mnt_hash *next const char *mnt_devname = “none” struct hlist_node mnt_hash dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ramfs_sops s_root s_bdev s_id[32] = “rootfs” s_fs_info s_type = rootfs_fs_type d_parent mnt_root mnt_sb struct vfsmount mnt mount – after mounting a new ext4 file system mount mnt_parent struct dentry *mnt_mountpoint **pprev *next const char *mnt_devname = “/dev/loop0” mnt_root mnt_sb dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union struct vfsmount mnt super_block s_op = ext4_sops s_id[32] = “loop0” s_type = ext4_fs_type d_parent struct hlist_node mnt_hash dentry d_name d_parent qstr name = “mnt” u32 hash u32 len = 3 u64 hash_len union /adrian/mnt d_sb d_sb
  • 29. `ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
  • 30. `ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
  • 31. `ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …)
  • 32. path mnt dentry vfsmount mnt_root mnt_sb qstr name = “mnt” u32 hash u32 len = 3 u64 hash_len union dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ramfs_ops s_root s_bdev s_id[32] = “rootfs” s_fs_info d_inode dentry d_name d_inode d_parent s_type = rootfs_fs_type path mnt dentry vfsmount mnt_root mnt_sb qstr name = “mnt /” u32 hash u32 len = 3 1 u64 hash_len union dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ext4_sops s_root s_bdev s_id[32] = “loop0” s_fs_info d_inode dentry d_name d_inode d_parent s_type = ext4_fs_type /adrian/mnt data structures in the original file system /adrian/mnt data structures in newly mounted file system traverse_mounts() `ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …) The original mount point is changed the newly one when running name resolution
  • 33. path mnt dentry vfsmount mnt_root mnt_sb qstr name = “mnt /” u32 hash u32 len = 3 1 u64 hash_len union dentry d_name qstr name = “/” u32 hash u32 len = 1 u64 hash_len union super_block s_blocksize s_maxbytes s_op = ext4_sops s_root s_bdev s_id[32] = “loop0” s_fs_info d_inode dentry d_name d_inode d_parent s_type = ext4_fs_type /adrian/mnt data structures in newly mounted file system flags: path->dentry->d_flags `ls –color mnt`: openat(AT_FDCWD, "/adrian/mnt“, …) REF-walk [Two concurrent mechanisms for name resolution] 1. REF-walk: concurrent management with refcounts and spinlock 2. RCU-walk: faster name resolution lookup