SlideShare une entreprise Scribd logo
1  sur  91
Télécharger pour lire hors ligne
自己動手,豐衣足食
淺談探索 Linux 系統
   設計之道
   Jim Huang( 黃敬群 ) “jserv”
   blog: http://blog.linux.org.tw/jserv/
   From 0xlab - http://0xlab.org/

   Study-Area / June 5, 2010
授權聲明

Attribution – ShareAlike 2.0
You are free
   to copy, distribute, display, and perform the            © Copyright 2010
   work                                            Jim Huang <jserv.tw@gmail.com>
   to make derivative works                          Partial © Copyright 2004-2005
   to make commercial use of the work                     Michael Opdenacker
Under the following conditions                       <michael@free-electrons.com>
      Attribution. You must give the original
      author credit.
      Share Alike. If you alter, transform, or
      build upon this work, you may distribute
      the resulting work only under a license
      identical to this one.
   For any reuse or distribution, you must make
   clear to others the license terms of this work.
   Any of these conditions can be waived if you
   get permission from the copyright holder.
Your fair use and other rights are in no way
   affected by the above.
License text:
   http://creativecommons.org/licenses/by-sa/2.0/legalcode
參考組態

Ubuntu Linux 10.04 / 10.10
 Kernel: 2.6.32-15-generic
 gcc: 4.4.4
 glibc: 2.11.1
Lenovo ThinkPad X200
 Intel Core2 Duo CPU 2.4 GHz
Agenda

Linux 核心設計概念
Rosetta Stone : Linux 如何建立軟硬體關聯
Linux 驅動程式架構與發展
尋幽訪勝自己來
Linux 的
核心設計概念
[ 概念 I]
以 C 語言建構
C 語言精髓: Pointer




Pointer 也就是記憶體 (Memory) 存取的替身
重心: Linux → Pointer → Memory
Linux 跟其他 UNIX 一樣,採用 C 作為主要開發語言,硬體相關部
份則透過組合語言
不採用 C++ 的緣故可見 http://www.tux.org/lkml/#s15-3
主要考量點:效率
Linux kernel size
                                     Size of Linux 2.6.32 source directories (KB)


       drivers
         arch
            fs                Linux 2.6.32 report by SLOCCount (http://dwheeler.com/sloccount/)
        sound
          net
                              Totals grouped by language
       include
Documentation
     firmware
                              ansic:      7680247 (96.91%)
       kernel                 asm:         225175 (2.84%)
          mm                  perl:          9015 (0.11%)
       scripts
                              sh:            3404 (0.04%)
       crypto
      security
                              yacc:          2964 (0.04%)
           lib                lex:           1824 (0.02%)
         tools                cpp:           1795 (0.02%)
         block
                              python:         493 (0.01%)
           ipc
          virt
                              awk:            109 (0.00%)
          init                sed:             30 (0.00%)
     samples
           usr                Total lines of code: 7925056
                 0   20000   40000    60000      80000      100000      120000      140000   160000   180000   200000
為了效率,充斥了奇計淫           ^H^H^H   技巧

  Linux 核心依賴 GNU gcc 特有擴充
  對編譯器作大量提示,如 likely/unlikely 擴充
if (unlikely(err)) {
       ...
}
    平台特有優化
[ 概念 II]
多層、多人、
    多工
核心架構
   App1               App2                 ...
                                                                                   User
                                                                                   space
                              C library


                       System call interface
  Process      Memory         Filesystem            Device        Networking
management    management       support              control                        Kernel
                                                                                   space
                              Filesystem
                                 types

CPU support   CPU / MMU        Storage             Character        Network
   code        support code    drivers           device drivers   device drivers

                                                                                   Hardware

   CPU                  RAM          Storage
“From a technical standpoint,
I believe the kernel will be
"more of the same" and
that all the _really_
interesting staff will be going
out in user space.”
 -- Linus Torvalds
 October 2001
[ 概念 III]
Linux 核心充斥大量的物件
       導向設計
   kobject: ADT (Abstract Data Typing)
 device driver: Inheritance/polymorphism
   hotplug(), probe(): dynamic binding
Buses : 處理器與一個或多個裝置間的通道
Devices 與對應的 device drivers
                                 struct
  device                         device
               struct
            ldd_device
                                             Driver kobject
                         bus
                                              core   core
                                 struct
   driver                        device
              struct
            ldd_driver           driver

                                                 struct
             Functional view inside kernel      kobject
Kobject 的繼承關係
parent pointer 與 ksets
 “parent” points to another kobject, representing
 the next level up
 “kset” is a collection of kobjects
 kset are always represented in sysfs
 Every kobject that is a member of a kset is
 represented in sysfs
[ 概念 IV]
Everything is
   file.
核心與週邊
       App1             App2                 ...
                                                                              User
                                                                              space
                                C library

                          System call interface
      Process   Memory          Filesystem         Device    Networking
    management management        support           control                    Kernel
                                Filesystem                                    space
                                   types
    CPU support CPU / MMU        Storage        Character      Network
       code      support code    drivers      device drivers device drivers

                                                                              Hardware
        CPU               RAM          Storage

• UNIX 法則 : “Everything is file”
  – 實體記憶體 (/dev/mem) 、網路 (socket) 、實體 / 虛擬裝置、系
     統資訊、驅動程式的控制、週邊組態、 ...
• Linux Device Driver 的角色即是將 file operation 映射到 Device
  – 有明確階層概念
  – 不限於 kernel-space driver
  – 經典的 user-space driver 如 X11 video driver
VFS( 虛擬檔案系統 ) 扮演的角色
掛載 VFS

     Mounting /proc:
     sudo mount -t proc none /proc
     Mounting /sys:
     sudo mount -t sysfs none /sys

                          Filesystem type        Raw device         Mount point
                                            or filesystem image
                                           In the case of virtual
                                      filesystems, any string is fine


/proc/cpuinfo: processor information
/proc/meminfo: memory status                                  # man proc
/proc/version: kernel version and build information
/proc/cmdline: kernel command line
程序與 VFS 之間的互動
debugfs

用以揭露核心資訊的虛擬檔案系統
   核心組態: DEBUG_FS
   Kernel hacking -> Debug Filesystem
   掛載方式:
   sudo mount -t debugfs none /debug
   http://lwn.net/Articles/334068/
   搭配 Ftrace (CONFIG_FUNCTION_TRACER)
# cat /debug/tracing/available_tracers
blk function_graph function sched_switch nop
# echo function > /debug/tracing/current_tracer
# echo 1 >/debug/tracing/tracing_on
( 進行某些動作 )
# echo 0 >/debug/tracing/tracing_on
Ftrace
# head /debug/tracing/trace
# tracer: function
#
#   TASK-PID    CPU#    TIMESTAMP   FUNCTION
#      | |       |          |          |
    Xorg-1006 [000] 3055.314290:    _cond_resched <-copy_from_user
    Xorg-1006 [000] 3055.314291:    i915_gem_sw_finish_ioctl <-drm_ioctl
    Xorg-1006 [000] 3055.314291:    mutex_lock <-i915_gem_sw_finish_ioctl
    Xorg-1006 [000] 3055.314291:    _cond_resched <-mutex_lock
    Xorg-1006 [000] 3055.314291:    drm_gem_object_lookup <-i915_gem_sw_finish_ioctl
    Xorg-1006 [000] 3055.314291:    _spin_lock <-drm_gem_object_lookup
Memory is file
  as well.
     /dev/mem
  virtual memory
極為真實的
 Virtual
 Memory
先回顧個人電腦的 I/O 架構




x86 提供一組指令存取 IO port
PMIOand outs instructions vs MMIO (Memory-Mapped I/O)
in, out, ins, (Port-Mapped I/O)
                       ISA vs. PCI
更全面的觀點
I/O 的型態

Programmed I/O (PIO)
 port-mapped I/O
 memory-mapped I/O
Interrupt-driven I/O ( 相對於 polling)
Direct Memory Access (DMA)
Channel I/O (I/O processor)
User
                                             process
                                                                   SIGSEGV, kill
                                      嘗試存取


                                                                     Linux
         實體 / 硬體位址 (Addressing)                                      Kernel
0xFFFFFFFFF                                            Exception
                                   非法 / 不允許的            (MMU)
              I/O memory 3
                                    記憶體區域
              I/O memory 2     Memory
                             Management
              I/O memory 1      Unit                         Proc1       Proc2


                 Flash         MMU        CPU            Process   Memory
                                                       management management
                                                       CPU support CPU / MMU
                RAM 1                                     code      support code

                RAM 0
                                                           CPU                RAM

0x00000000
Physical / Virtual memory
         Physical address space                               Virtual address spaces
0xFFFFFFFFF                                     0xFFFFFFFFF           0xFFFFFFFFF
                                                                                       Kernel
              I/O memory 3                                             0xC0000000

                                                                        Process1
              I/O memory 2         Memory
                                  Management
                                     Unit                              0x00000000
              I/O memory 1                       0x00000000




                 Flash            MMU          CPU

                                                                      0xFFFFFFFFF
                                                                                       Kernel
                RAM 1                                                  0xC0000000


                RAM 0                                                   Process2


                                                                       0x00000000

0x00000000
                             所有的 process 都有獨自
                             的 virtual address space ,
                             認為可存取整個空間
Everything is
    file.
* 需將裝置對應到
Virtual Memory 才
      能使用
mmap
                           mmap
                           系統呼叫
                                         Device driver
        Process                       ( 註冊的 ) 檔案操作 mmap 被呼叫
                                         初始化硬體 - 記憶體的對應


                 存取
                 virtual
                 address                                    access
                                MMU                         physical
                                                            address




Process virtual address space                        Physical address space
以 X server 為例
# pmap `pidof X` | grep dev | head
00590000     40K r­x­­  /lib/libudev.so.0.6.1
0059a000      4K r­­­­  /lib/libudev.so.0.6.1
0059b000      4K rw­­­  /lib/libudev.so.0.6.1
00b02000     36K r­x­­  /usr/lib/xorg/modules/input/evdev_drv.so
00b0b000      4K r­­­­  /usr/lib/xorg/modules/input/evdev_drv.so
00b0c000      4K rw­­­  /usr/lib/xorg/modules/input/evdev_drv.so
a75e8000    128K rw­s­  /dev/dri/card0
a7608000    128K rw­s­  /dev/dri/card0
a7628000    128K rw­s­  /dev/dri/card0
a7648000    128K rw­s­  /dev/dri/card0



  start address   size      Mapped file name




                            可對照 /proc/`pidof X`/maps 的內容
mmap :: user-space
• 取得裝置的 fd (file descriptor) 後 ...
  mmap 系統呼叫
  void * mmap(
     void *start,     /*   Often 0, preferred starting address */
     size_t length,   /*   Length of the mapped area */
     int prot ,       /*   Permissions: read, write, execute */
     int flags,       /*   Options: shared mapping, private copy */
     int fd,          /*   Open file descriptor */
     off_t offset     /*   Offset in the file */
  );




                mmap :: kernel-space
       int (*mmap) (
          struct file *,             /* Open file structure */
          struct vm_area_struct * /* Kernel VMA structure */
       );
實例: devmem
  直接 peek (read) 或 poke (write) 某一段已對應的實
  體位址: (b: byte, h: half, w: word)
  devmem 0x000c0004 h (reading)
  devmem 0x000c0008 w 0xffffffff (writing)
if ((fd = open("/dev/mem", O_RDWR | O_SYNC)) == -1)
    FATAL;           # devmem 0x000c0004 h
                     Memory mapped at address 0xb7fb5000.
                     Value at address 0xC0004 (0xb7fb5004): 0xE3A9

/* Map one page */
map_base = mmap(0, MAP_SIZE, PROT_READ |
PROT_WRITE, MAP_SHARED, fd, target & ~MAP_MASK);
if (map_base == (void *) -1)
    FATAL;
virt_addr = map_base + (target & MAP_MASK);
Linux 如何
建立軟硬體關聯
/proc/iomem
... 是 Rosetta
     Stone
• 1799 年拿破崙遠征埃及時期,法軍上尉 Pierre-François Xavier
  Bouchard 在尼羅河口港灣城市羅塞塔 (Rosetta ,今日稱為 el-
  Rashid) 發現此碑石,自此揭開古埃及象形文字之謎
• 羅塞塔碑石製作於公元前 196 年,原是一塊刻有埃及國王托勒
  密五世召書的碑石,但由於這塊碑石同時刻有同一段文字的三種
  不同語言版本,使得近代的考古學家得以在對照各語言版本的內
  容後,解讀出已經失傳千餘年的埃及象形文之意義與結構,而成
  為今日研究古埃及歷史的重要里程碑
• 由於破解埃及象形文這種如謎題
  般事物之起點, Rosetta Stone
  被比喻為解決難題或謎題的關鍵
  線索或工具
  # cat /proc/iomem
# cat /proc/iomem
00000000-0009efff : System RAM
0009f000-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cbfff : pnp 00:00
000cf000-000cffff : Adapter ROM
000d0000-000d0fff : Adapter ROM
000d2000-000d3fff : reserved
000dc000-000dffff : pnp 00:00
000e0000-000e3fff : pnp 00:00
000e4000-000e7fff : pnp 00:00
000e8000-000ebfff : pnp 00:00                            實體 / 硬體位址 (Addressing)
000f0000-000fffff : System ROM
00100000-7f6cffff : System RAM                  0xFFFFFFFFF
  00100000-0031b5a3 : Kernel code
  0031b5a4-00414dc3 : Kernel data                             I/O memory 3
  00476000-004eba7f : Kernel bss
7f6d0000-7f6defff : ACPI Tables
7f6df000-7f6fffff : ACPI Non-volatile Storage                 I/O memory 2    Memory
7f700000-7fffffff : reserved                                                 Management
88000000-8bffffff : PCI CardBus #16
d0000000-dfffffff : 0000:00:02.0                              I/O memory 1      Unit
...
e4300000-e7ffffff : PCI Bus #15
  e4300000-e4300fff : 0000:15:00.0
    e4300000-e4300fff : yenta_socket                             Flash         MMU
  e4301000-e43017ff : 0000:15:00.1
    e4301000-e43017ff : ohci1394
  e4301800-e43018ff : 0000:15:00.2
    e4301800-e43018ff : sdhci:slot0                             RAM 1
e8000000-e9ffffff : PCI Bus #04
ea000000-ebffffff : PCI Bus #0c
ec000000-edffffff : PCI Bus #03                                 RAM 0
  edf00000-edf00fff : 0000:03:00.0
    edf00000-edf00fff : iwl3945
...

                                                0x00000000
典型 PCI
低階「通訊」概況
實際的 I/O 操作途徑
PMIO (Port-Mapped I/O) vs MMIO (Memory-Mapped I/O)
$ cat /proc/ioports        $ cat /proc/iomem

0000-001f : dma1           00000000-0009fbff : System RAM
0020-0021 : pic1            00000000-00000000 : Crash kernel
0040-0043 : timer0
                           0009fc00-0009ffff : reserved
0050-0053 : timer1
                           000a0000-000bffff : Video RAM area
0060-006f : keyboard
0070-0077 : rtc            000c0000-000c7fff : Video ROM
0080-008f : dma page reg   000cc000-000d97ff : Adapter ROM
00a0-00a1 : pic2
                           000f0000-000fffff : System ROM
00c0-00df : dma2
                           00100000-0ffeffff : System RAM
00f0-00ff : fpu
0170-0177 : ide1            00100000-00281514 : Kernel code
01f0-01f7 : ide0            00281515-003117b3 : Kernel data
02f8-02ff : serial         20000000-200fffff : PCI Bus #01
0376-0376 : ide1
                           [...]
[...]
Port-Mapped I/O 分佈
Requesting I/O ports
/proc/ioports                    struct resource *request_region(
0000-001f   :   dma1                unsigned long start,
0020-0021   :   pic1                unsigned long len,
0040-0043   :   timer0
0050-0053   :   timer1
                                    char *name);
0060-006f   :   keyboard
0070-0077   :   rtc
0080-008f
00a0-00a1
            :
            :
                dma page reg
                pic2
                                 嘗試保留給定區域
00c0-00df   :   dma2
00f0-00ff   :   fpu
0100-013f   :   pcmcia_socket0   request_region(0x0170, 8, "ide1");
0170-0177   :   ide1
01f0-01f7   :   ide0             void release_region(
0376-0376   :   ide1
0378-037a   :   parport0            unsigned long start,
03c0-03df   :   vga+                unsigned long len);
03f6-03f6   :   ide0

                                 參考 include/linux/ioport.h 與
03f8-03ff   :   serial
0800-087f   :   0000:00:1f.0
0800-0803   :   PM1a_EVT_BLK
0804-0805   :   PM1a_CNT_BLK     kernel/resource.c
0808-080b   :   PM_TMR
0820-0820   :   PM2_CNT_BLK
0828-082f   :   GPE0_BLK
...
Requesting I/O memory
/proc/iomem
                                         介面一致
00000000-0009efff :   System RAM
0009f000-0009ffff :   reserved           struct resource * request_mem_region(
000a0000-000bffff :   Video RAM area        unsigned long start,
000c0000-000cffff :   Video ROM             unsigned long len,
000f0000-000fffff :   System ROM
00100000-3ffadfff :   System RAM
                                            char *name);
  00100000-0030afff   : Kernel code
  0030b000-003b4bff   : Kernel data      void release_mem_region(
3ffae000-3fffffff :   reserved              unsigned long start,
40000000-400003ff :   0000:00:1f.1          unsigned long len);
40001000-40001fff :   0000:02:01.0
  40001000-40001fff   : yenta_socket
40002000-40002fff :   0000:02:01.1
  40002000-40002fff   : yenta_socket
40400000-407fffff :   PCI CardBus #03
40800000-40bfffff :   PCI CardBus #03
40c00000-40ffffff :   PCI CardBus #07
41000000-413fffff :   PCI CardBus #07
a0000000-a0000fff :   pcmcia_socket0
a0001000-a0001fff :   pcmcia_socket1
e0000000-e7ffffff :   0000:00:00.0
e8000000-efffffff :   PCI Bus #01
  e8000000-efffffff   : 0000:01:00.0    大部分驅動程式的工作大概就在讀寫
...                                     I/O port 或 I/O memory( 統稱 I/O region)
io_mem.c(1)
#include <linux/init.h>
#include <linux/module.h>
#include <linux/moduleparam.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/ioport.h>


static int Major, result;
static struct file_operations fops;
MODULE_LICENSE("GPL");
unsigned long start = 1, length = 1;
module_param(start, long, 1);
module_param(length, long, 1);
...
module_init (memIO_init);
module_exit (memIO_cleanup);
io_mem.c(2)
int memIO_init (void)
{
     Major = register_chrdev (0, "memIO_device", &fops);
     if (Major < 0) {
         printk (" Major number allocation is failed n");
         return (Major);
     }
     printk (" The Major number of the device is %d n", Major);
     result = check_mem_region (start, length);
     if (result < 0) {
         printk ("Allocation for I/O memory range is failed: “
                 ”Try other rangen");
         return (result);
     }
     request_mem_region (start, length, "memIO_device");
    return 0;
}
io_mem.c(3)
void memIO_cleanup (void)
{
    release_mem_region (start, length);
    printk (" The I/O memory region is released “
            ”successfully n");
    unregister_chrdev (Major, "memIO_device");
    printk (" The Major number is released successfully n");
}
在發送 Request 之前
# cat /proc/devices       # cat /proc/iomem
Character devices:        00000000-0009efff :   System RAM
  1 mem                   0009f000-0009ffff :   reserved
  2 pty                   000a0000-000bffff :   Video RAM area
  3 ttyp                  000c0000-000c7fff :   Video ROM
  4 /dev/vc/0             000c8000-000cbfff :   pnp 00:00
  4 tty                   000cf000-000cffff :   Adapter ROM
  4 ttyS                  000d0000-000d0fff :   Adapter ROM
  5 /dev/tty              000d2000-000d3fff :   reserved
  5 /dev/console          ...
  5 /dev/ptmx             ee444000-ee4443ff :   0000:00:1d.7
  7 vcs                     ee444000-ee4443ff   : ehci_hcd
 10 misc                  ee444400-ee4447ff :   0000:00:1f.2
 13 input                   ee444400-ee4447ff   : ahci
 14 sound                 ...
 21 sg
 29 fb
108 ppp
116 alsa
128 ptm               # insmod io_mem.ko start=0xeeee0000
136 pts
171 ieee1394                             length=1
180 usb
189 usb_device
226 drm
...
發送 Request 之後
# cat /proc/devices          # cat /proc/iomem
Character devices:           00000000-0009efff : System RAM
  1 mem                      0009f000-0009ffff : reserved
  2 pty                      000a0000-000bffff : Video RAM area
  3 ttyp                     000c0000-000c7fff : Video ROM
  4 /dev/vc/0                000c8000-000cbfff : pnp 00:00
  4 tty                      000cf000-000cffff : Adapter ROM
  4 ttyS                     000d0000-000d0fff : Adapter ROM
  5 /dev/tty                 000d2000-000d3fff : reserved
  5 /dev/console             ...
  5 /dev/ptmx                ee444000-ee4443ff : 0000:00:1d.7
  7 vcs                        ee444000-ee4443ff : ehci_hcd
 10 misc                     ee444400-ee4447ff : 0000:00:1f.2
 13 input                      ee444400-ee4447ff : ahci
 14 sound                    eeee0000-eeee0000 : memIO_device
 21 sg                       ...
 29 fb
108 ppp
116 alsa
128 ptm          # dmesg
136 pts          ...
171 ieee1394     [11682.040057] The Major number of the device is 251
180 usb
189 usb_device
226 drm
252 memIO_device
...
Polling




copy_from_user(buffer, p, count);          /* p is the kernel buffer */
for (i = 0; i < count; i++) {              /* loop on every character */
    while (*printer_status_reg != READY)   /* loop until ready */
       ;
    printer_data_register = p[i];          /* output one character */
}
return_to_user();
Interrupt
IRQ lines
$ cat /proc/interrupts
            CPU0
 0:    111151372   IO-APIC-edge    timer
 1:           8    IO-APIC-edge    i8042
 6:           2    IO-APIC-edge    floppy
 7:           2    IO-APIC-edge    parport0
 8:           4    IO-APIC-edge    rtc
 9:           1    IO-APIC-level   acpi
 12:         114    IO-APIC-edge   i8042
 14:     3277682    IO-APIC-edge   ide0
 15:          65    IO-APIC-edge   ide1
169:    14445090   IO-APIC-level   eth0
NMI:          0
LOC:   111150116
ERR:          0
MIS:          0
Linux 驅動程式
  架構與發展
Linux 驅動程式架構
User-Space                      Application


                           System Call Interface


                         Virture File System(VFS)
                                                                       BSD socket

                                                   Network           inet(AF_INET)
Kernel-Space                   Buffer Cache       Subsystem
                                                                   Transport(TCP,UDP)
                                                                      Network(IP)
                Character           Block            Network
               Device Driver     Device Driver     Device Driver


                               Device Interface



Hardware                  Physical Device (Hardware)
Everything is file.
::Device File::
Device File 類型
              Character           Block          Network              USB, Firewire,
             Device Driver     Device Driver   Device Driver            SCSI,...




crw--w--w-
crw--w--w-   0
             0   root
                 root        root
                             root         5,
                                          5,    1
                                                1   Oct 1
                                                    Oct 1      1998
                                                               1998   console
                                                                      console
crw-rw-rw-
crw-rw-rw-   1
             1   root
                 root        root
                             root         1,
                                          1,    3
                                                3   May 6
                                                    May 6      1998
                                                               1998   null
                                                                      null
crw-------
crw-------   1
             1   root
                 root        root
                             root         4,
                                          4,    0
                                                0   May 6
                                                    May 6      1998
                                                               1998   tty
                                                                      tty
crw-rw----
crw-rw----   1
             1   root
                 root        disk
                             disk        96,
                                         96,    0
                                                0   Dec 10
                                                    Dec 10     1998
                                                               1998   pt0
                                                                      pt0
crw-------
crw-------   1
             1   root
                 root        root
                             root         5,
                                          5,   64
                                               64   May 6
                                                    May 6      1998
                                                               1998   cua0
                                                                      cua0


brw-------
brw-------    1
              1   root
                  root         floppy
                               floppy          2,
                                               2,   0
                                                    0   May
                                                        May    6
                                                               6   1998
                                                                   1998   fd0
                                                                          fd0
brw-rw----
brw-rw----    1
              1   root
                  root         disk
                               disk            3,
                                               3,   0
                                                    0   May
                                                        May    6
                                                               6   1998
                                                                   1998   hda
                                                                          hda
brw-rw----
brw-rw----    1
              1   root
                  root         disk
                               disk            3,
                                               3,   1
                                                    1   May
                                                        May    6
                                                               6   1998
                                                                   1998   hda1
                                                                          hda1
brw-rw----
brw-rw----    1
              1   root
                  root         disk
                               disk            8,
                                               8,   0
                                                    0   May
                                                        May    6
                                                               6   1998
                                                                   1998   sda
                                                                          sda
brw-rw----
brw-rw----    1
              1   root
                  root         disk
                               disk            8,
                                               8,   1
                                                    1   May
                                                        May    6
                                                               6   1998
                                                                   1998   sda1
                                                                          sda1
Major Number 與 Minor Number
• /dev 目錄下的檔案稱為 device
  files ,用以辨識 / 對應於裝置
                                               User­space
• Kernel 透過 major number 來
  指定正確的驅動程式給 device                          Read
                                             buffer
                                                          Write
                                                          string
  files
                                            read               write
• Minor number 只有驅動程式本
  身會使用到                                            /dev/foo




                                                                       Copy from user
                             Copy to user
                                               major / minor



                                             Read          Write
                                            handler       handler
                                               Device driver
                                               Kernel space
註冊的 device driver
顯示於 /proc/devices:
   Character devices:     Block devices:
     1 mem                1 ramdisk
     4 /dev/vc/0          3 ide0
     4 tty                8 sd
     4 ttyS               9 md
     5 /dev/tty          22 ide1
     5 /dev/console      65 sd
     5 /dev/ptmx         66 sd
     6 lp                67 sd
    10 misc              68 sd
    13 input
    14 sound
    ...
                     Major Registered
                    number   name
Linux Device Driver 資料結構
struct   file_operations   {       /* character device drivers */
         struct module *owner;
         loff_t (*llseek) (struct file *, loff_t, int);
         ssize_t (*read) (struct file *, char *, size_t, loff_t *);
         ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
         int (*readdir) (struct file *, void *, filldir_t);
         unsigned int (*poll) (struct file *, struct poll_table_struct *);
         int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
         int (*mmap) (struct file *, struct vm_area_struct *);
         int (*open) (struct inode *, struct file *);
         int (*flush) (struct file *);
         int (*release) (struct inode *, struct file *);
         int (*fsync) (struct file *, struct dentry *, int datasync);
         int (*fasync) (int, struct file *, int);
         int (*lock) (struct file *, int, struct file_lock *);
         ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
         ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
         ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
         unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long,
                           unsigned long, unsigned long);
};
檔案操作的內部
                                        get PAL video                       get NTSC video


                        Process 1                         Process 2

                                                                 ioctl: change read
                                 read
                                                                 read fop
                        Open file 1                       Open file 2
           open                                    open
實做 file_operations structure
                                                                 兩個不同 process 可自同個
#include <linux/fs.h>                     /dev/video             device file 讀取不同的資料
static struct file_operations
my_fops =
{
    .owner = THIS_MODULE,
    .read = moko_read,
    .write = moko_write,
};

將上述 my_read/my_write 指向為具體實做
否則採用預設 function( 如 open, release...)
read() 操作內部
 The read methods copies data to application code.

ssize_t read(struct file *filp, char *buff, size_t count, loff_t *offp);
Device number 配置方式
#include <linux/fs.h>                               #include <linux/fs.h>
int register_chrdev_region(                         int alloc_chrdev_region(
  dev_t from,     /* Starting device number */         dev_t *dev, /* Output: device number */
  unsigned count,/* Number of device numbers */        unsigned baseminor,
  const char *name);/* Registered name */                /* Starting minor number, usually 0 */
Returns 0 if the allocation was successful.            unsigned count,

Example                                                  /* Number of device numbers */

if (register_chrdev_region(                            const char *name /* Registered name */

         MKDEV(202, 128),                           );

       moko_count, “moko”)) {                       Returns 0 if the allocation was
   printk(KERN_ERR “Failed to allocate “               successful.

                    ”device numbern”);             Example
   ...
                                                    if (alloc_chrdev_region(&moko_dev, 0,
                                                       moko_count, “moko”)) {
Allocating fixed                                       printk(KERN_ERR “Failed to allocate
                                                       device numbern”);
device numbers                                         ...

                                                 Dynamic allocation of
                                                 device numbers
                                                 Safer: have the kernel allocate free numbers for you!
要點
• 當 LKM 即將從系統釋放, major number 必須交
  還系統
• 檔案操作與結構
    Device ID By file structure
    核心使用 file ops 以存取 driver 提供的函式
 open / close


    初始化裝置並調整 usage count
 Memory


    Device-memory Link List
 Read / Write


    Transfer data from Kernel to User
進階 I/O 控制
 Blocking / Non-blocking I/O
• Polling: poll and select function
• Asynchronous notification
• Access control of device file
考量點
• Synchronization / Avoid race condition
   – semaphore / mutex / kernel lock
• Reentrance
• Preemption in kernel code
• Interrupt timer / peripheral
• Take care of bandwidth issues
• DMA vs. PIO (Programming I/O) model
• SMP
HW Concurrency (1)
                          INT
           IRQ
 I/O               I/O             CPU
device            APIC

                         INT ACK
I/O APIC 對裝置不斷作 poll 並發出 interrupt
在 CPU 回應 (ACK/acknowledge) 前,沒有新的
interrupt 可被發出
通常核心得在多個 interrupt 產生的情況下執行
HW Concurrency (2)

Symmetrical MultiProcessor (SMP) 顧名思
義,擁有兩個以上的 CPU
SMP kernel 必須在有效的 CPU 上同步執行
情境:在其中一顆 CPU 上執行了網路相關
的 service routine ,而另一個 CPU 則執行
檔案系統相關的動作
# cat /proc/interrupts
           CPU0          CPU1
  0:    4988074    5094948      IO-APIC-edge     timer
...
 29:          7      11515      PCI-MSI-edge      eth0
 30:     290411     379267      PCI-MSI-edge      i915
 31:     651550     496152      PCI-MSI-edge      iwlagn
NMI:          0            0    Non-maskable interrupts
...
TRM:          0            2    Thermal event interrupts
THR:          0            0    Threshold APIC interrupts
MCE:          0            0    Machine check exceptions
MCP:         77           77    Machine check polls
ERR:          1
MIS:          0
interrupt 總量
# cat /proc/stat | grep intr
intr 8190767 6092967 10377 0 1102775 5 2 0 196 ...
     Total number    IRQ1     IRQ2 IRQ3
     of interrupts    total    total ...
Linux 2.6+ 新的
  Driver Model
Device Model 特徵   (1)


從簡化能源管理處理出發,現在已大幅改進
充分表現硬體系統架構
提供 user-space 層級的表示途徑: sysfs
比 /proc 更全面
更一致、容易維護的 device interface:
include/linux/device.h
Device model 特徵      (2)


允許從多角度檢視系統
 從系統上已存在的裝置來看: power state 、連
 結上的 bus ,與對應的 driver
 從 system bus 結構來看: bus 與 bus 之間的連
 結方式 ( 如 PCI bus 上的 USB bus controller) 、
 對應的 driver
 從裝置本身來看: input, net, sound...
 得以簡便地找到裝置,而不需顧慮實體連結方式
“From a technical standpoint,
I believe the kernel will be
"more of the same" and
that all the _really_
interesting staff will be going
out in user space.”
 -- Linux Torvalds
 October 2001
sysfs

Device Model 的 user-space 表現方式
Kernel 配置編譯選項
CONFIG_SYSFS=y (Filesystems -> Pseudo
filesystems)
掛載方式:
sudo mount -t sysfs none /sys
探索 /sys 是理解硬體設計便利的途徑
sysfs tools
http://linux-diag.sourceforge.net/Sysfsutils.html
   libsysfs - 提供一致且穩定的介面,得以查閱
   由 sysfs 公開的系統硬體資訊
   systool – 使用 libsysfs 的應用程式,可列
   印裝置的 bus, class, topology
Device Model 參考資訊

最方便且清晰的文件就內建於核心程式碼中
Documentation/driver-model/
Documentation/filesystems/sysfs.txt
udev / hotplug
原本 /dev 暴露的問題

Red Hat 9 Linux 上,為了支援可能的應用,
必須在 /dev 擺放 18000 個 device files
需要對 major number 作授權與協調分配
http://lanana.org/: Linux Assigned Names
and Numbers Authority
對於 user-space 應用程式來說,既不知曉在
系統上的裝置,也不知曉對應 /dev 上的
device files
udev 解決方案

善用 sysfs 機制
 全部在 user-space
 依據實際上硬體新增與移除的裝況,自動建立與
 刪除 /dev/ 底下的 device file
 Major and minor device transmitted by the
 kernel.
 Requires no change to driver code.
 Fast: written in C
 Small size: udevd version 117: 67 KB in
 Ubuntu 8.04
udev 典型操作
Kernel driver core    uevent
                                       udevd
  (usb, pci...)


                                  udev event process
                               Matches event to rules

                                 Creates / removes
                                    device files


                         /lib/udev/ programs or others
                               Load the right module

                                 Notify userspace
                                 programs (GUI...)
識別 device driver modules
         Kernel / module compiling                                 System everyday life

  Each driver announces which device and vendor            The driver core (usb, pci...) reads the device id,
 ids it supports. Information stored in module files.          vendor id and other device attributes.


     The depmod ­a command processes
                                                          The kernel sends an event to udevd, setting the 
          module files and generates
                                                        MODALIAS environment variable, encoding these data.
/lib/modules/<version>/modules.alias


                                                                     A udev event process runs
                                                                     modprobe $MODALIAS



                                                                modprobe finds the module to load
                                                                  in the modules.alias file.
Firmware hotplugging 實做
                Kernel space                                      Userspace
                    Driver                        /sys/class/firmware/xxx/{loading,data}
        calls request_firmware()                                   appear
                   Sleeps

                                                     firmware subsystem event sent to udev
                                                    Calling /lib/udev/firmware_helper

                    Kernel
        Get ready to load firmware data                /lib/udev/firmware_helper
  Grows a buffer to accommodate incoming data     echo 1 > /sys/class/firmware/xxx/loading
                                                cat fw_image > /sys/class/firmware/xxx/data
                                                  echo 0 > /sys/class/firmware/xxx/loading

                    Driver
     wakes up after request_firmware()
        Copies the buffer to the hardware
        Calls release_firmware()



See Documentation/firmware_class/ for a nice overview
參考資訊
• KernelTrap
  http://kerneltrap.org/
  – Forum website for kernel developers
  – News, articles, whitepapers, discussions, polls, interviews
• Linux Device Drivers, 3rd edition, Feb 2005
  – Jonathan Corbet, Alessandro Rubini,
    Greg Kroah-Hartman, O'Reilly
• Linux Kernel in a Nutshell, Dec 2006
    Greg Kroah-Hartman, O'Reilly

Contenu connexe

Tendances

Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerPlatonov Sergey
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)shimosawa
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDBLinaro
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)shimosawa
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/CoreShay Cohen
 
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Kent Chen
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux SystemJian-Hong Pan
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBshimosawa
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File SystemAdrian Huang
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Tushar B Kute
 

Tendances (20)

Address/Thread/Memory Sanitizer
Address/Thread/Memory SanitizerAddress/Thread/Memory Sanitizer
Address/Thread/Memory Sanitizer
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
What Can Compilers Do for Us?
What Can Compilers Do for Us?What Can Compilers Do for Us?
What Can Compilers Do for Us?
 
Improve Android System Component Performance
Improve Android System Component PerformanceImprove Android System Component Performance
Improve Android System Component Performance
 
Q2.12: Debugging with GDB
Q2.12: Debugging with GDBQ2.12: Debugging with GDB
Q2.12: Debugging with GDB
 
The Internals of "Hello World" Program
The Internals of "Hello World" ProgramThe Internals of "Hello World" Program
The Internals of "Hello World" Program
 
Linux Initialization Process (1)
Linux Initialization Process (1)Linux Initialization Process (1)
Linux Initialization Process (1)
 
Linux Internals - Kernel/Core
Linux Internals - Kernel/CoreLinux Internals - Kernel/Core
Linux Internals - Kernel/Core
 
Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!Working Remotely (via SSH) Rocks!
Working Remotely (via SSH) Rocks!
 
Learn C Programming Language by Using GDB
Learn C Programming Language by Using GDBLearn C Programming Language by Using GDB
Learn C Programming Language by Using GDB
 
Unix v6 Internals
Unix v6 InternalsUnix v6 Internals
Unix v6 Internals
 
Launch the First Process in Linux System
Launch the First Process in Linux SystemLaunch the First Process in Linux System
Launch the First Process in Linux System
 
Interpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratchInterpreter, Compiler, JIT from scratch
Interpreter, Compiler, JIT from scratch
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Linux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKBLinux Kernel Booting Process (2) - For NLKB
Linux Kernel Booting Process (2) - For NLKB
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)Part 01 Linux Kernel Compilation (Ubuntu)
Part 01 Linux Kernel Compilation (Ubuntu)
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Virtual Machine Constructions for Dummies
Virtual Machine Constructions for DummiesVirtual Machine Constructions for Dummies
Virtual Machine Constructions for Dummies
 

En vedette

軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維
軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維 軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維
軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維 National Cheng Kung University
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計鍾誠 陳鍾誠
 
Develop Community-based Android Distribution and Upstreaming Experience
Develop Community-based Android Distribution and Upstreaming Experience Develop Community-based Android Distribution and Upstreaming Experience
Develop Community-based Android Distribution and Upstreaming Experience National Cheng Kung University
 
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)National Cheng Kung University
 
Web assembly overview by Mikhail Sorokovsky
Web assembly overview by Mikhail SorokovskyWeb assembly overview by Mikhail Sorokovsky
Web assembly overview by Mikhail SorokovskyValeriia Maliarenko
 
COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽Kito Cheng
 
An Introduction to WebAssembly
An Introduction to WebAssemblyAn Introduction to WebAssembly
An Introduction to WebAssemblyDaniel Budden
 
Is WebAssembly the killer of JavaScript?
Is WebAssembly the killer of JavaScript?Is WebAssembly the killer of JavaScript?
Is WebAssembly the killer of JavaScript?Boyan Mihaylov
 
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學National Cheng Kung University
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術Kito Cheng
 
用十分鐘向nand2tetris學會設計處理器
用十分鐘向nand2tetris學會設計處理器用十分鐘向nand2tetris學會設計處理器
用十分鐘向nand2tetris學會設計處理器鍾誠 陳鍾誠
 
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明National Cheng Kung University
 
用十分鐘 學會《資料結構、演算法和計算理論》
用十分鐘  學會《資料結構、演算法和計算理論》用十分鐘  學會《資料結構、演算法和計算理論》
用十分鐘 學會《資料結構、演算法和計算理論》鍾誠 陳鍾誠
 

En vedette (20)

軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維
軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維 軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維
軟體又熱又平又擠:淺談開放原始碼軟體衝擊下的新思維
 
用十分鐘 向jserv學習作業系統設計
用十分鐘  向jserv學習作業系統設計用十分鐘  向jserv學習作業系統設計
用十分鐘 向jserv學習作業系統設計
 
ARM and SoC Traning Part II - System
ARM and SoC Traning Part II - SystemARM and SoC Traning Part II - System
ARM and SoC Traning Part II - System
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone Discover System Facilities inside Your Android Phone
Discover System Facilities inside Your Android Phone
 
ARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- OverviewARM and SoC Traning Part I -- Overview
ARM and SoC Traning Part I -- Overview
 
Implement Checkpointing for Android
Implement Checkpointing for AndroidImplement Checkpointing for Android
Implement Checkpointing for Android
 
Develop Community-based Android Distribution and Upstreaming Experience
Develop Community-based Android Distribution and Upstreaming Experience Develop Community-based Android Distribution and Upstreaming Experience
Develop Community-based Android Distribution and Upstreaming Experience
 
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)
開放原始碼作為新事業: 台灣本土經驗談 (COSCUP 2011)
 
WebAssembly
WebAssemblyWebAssembly
WebAssembly
 
Web assembly overview by Mikhail Sorokovsky
Web assembly overview by Mikhail SorokovskyWeb assembly overview by Mikhail Sorokovsky
Web assembly overview by Mikhail Sorokovsky
 
COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽COSCUP 2014 : open source compiler 戰國時代的軍備競賽
COSCUP 2014 : open source compiler 戰國時代的軍備競賽
 
An Introduction to WebAssembly
An Introduction to WebAssemblyAn Introduction to WebAssembly
An Introduction to WebAssembly
 
Is WebAssembly the killer of JavaScript?
Is WebAssembly the killer of JavaScript?Is WebAssembly the killer of JavaScript?
Is WebAssembly the killer of JavaScript?
 
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
中輟生談教育: 完全用開放原始碼軟體進行 嵌入式系統教學
 
淺談編譯器最佳化技術
淺談編譯器最佳化技術淺談編譯器最佳化技術
淺談編譯器最佳化技術
 
用十分鐘向nand2tetris學會設計處理器
用十分鐘向nand2tetris學會設計處理器用十分鐘向nand2tetris學會設計處理器
用十分鐘向nand2tetris學會設計處理器
 
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
給自己更好未來的 3 個練習:嵌入式作業系統設計、實做,與移植 (2015 年春季 ) 課程說明
 
Android IPC Mechanism
Android IPC MechanismAndroid IPC Mechanism
Android IPC Mechanism
 
用十分鐘 學會《資料結構、演算法和計算理論》
用十分鐘  學會《資料結構、演算法和計算理論》用十分鐘  學會《資料結構、演算法和計算理論》
用十分鐘 學會《資料結構、演算法和計算理論》
 

Similaire à 淺談探索 Linux 系統設計之道

Microkernel-based operating system development
Microkernel-based operating system developmentMicrokernel-based operating system development
Microkernel-based operating system developmentSenko Rašić
 
Visual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & VirtualisationVisual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & Virtualisationwangyuanyi
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in LinuxSadegh Dorri N.
 
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019Dev opsec dockerimage_patch_n_lifecyclemanagement_2019
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019kanedafromparis
 
Linux on System z Update: Current & Future Linux on System z Technology
Linux on System z Update: Current & Future Linux on System z TechnologyLinux on System z Update: Current & Future Linux on System z Technology
Linux on System z Update: Current & Future Linux on System z TechnologyIBM India Smarter Computing
 
Linux architecture
Linux architectureLinux architecture
Linux architecturemcganesh
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & dockerejlp12
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernelguest547d74
 
Linux internal
Linux internalLinux internal
Linux internalmcganesh
 
Linux architecture
Linux architectureLinux architecture
Linux architecturemcganesh
 

Similaire à 淺談探索 Linux 系統設計之道 (20)

Microkernel-based operating system development
Microkernel-based operating system developmentMicrokernel-based operating system development
Microkernel-based operating system development
 
Walking around linux kernel
Walking around linux kernelWalking around linux kernel
Walking around linux kernel
 
Visual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & VirtualisationVisual comparison of Unix-like systems & Virtualisation
Visual comparison of Unix-like systems & Virtualisation
 
L4 Microkernel :: Design Overview
L4 Microkernel :: Design OverviewL4 Microkernel :: Design Overview
L4 Microkernel :: Design Overview
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Ch1 linux basics
Ch1 linux basicsCh1 linux basics
Ch1 linux basics
 
Lightweight Virtualization in Linux
Lightweight Virtualization in LinuxLightweight Virtualization in Linux
Lightweight Virtualization in Linux
 
System Integrity
System IntegritySystem Integrity
System Integrity
 
.ppt
.ppt.ppt
.ppt
 
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019Dev opsec dockerimage_patch_n_lifecyclemanagement_2019
Dev opsec dockerimage_patch_n_lifecyclemanagement_2019
 
Rhce ppt
Rhce pptRhce ppt
Rhce ppt
 
Linux on System z Update: Current & Future Linux on System z Technology
Linux on System z Update: Current & Future Linux on System z TechnologyLinux on System z Update: Current & Future Linux on System z Technology
Linux on System z Update: Current & Future Linux on System z Technology
 
Building
BuildingBuilding
Building
 
olibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linuxolibc: Another C Library optimized for Embedded Linux
olibc: Another C Library optimized for Embedded Linux
 
Linux architecture
Linux architectureLinux architecture
Linux architecture
 
Linux container & docker
Linux container & dockerLinux container & docker
Linux container & docker
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Architecture Of The Linux Kernel
Architecture Of The Linux KernelArchitecture Of The Linux Kernel
Architecture Of The Linux Kernel
 
Linux internal
Linux internalLinux internal
Linux internal
 
Linux architecture
Linux architectureLinux architecture
Linux architecture
 

Plus de National Cheng Kung University

PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimeNational Cheng Kung University
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明National Cheng Kung University
 
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明National Cheng Kung University
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsNational Cheng Kung University
 
Lecture notice about Embedded Operating System Design and Implementation
Lecture notice about Embedded Operating System Design and ImplementationLecture notice about Embedded Operating System Design and Implementation
Lecture notice about Embedded Operating System Design and ImplementationNational Cheng Kung University
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明National Cheng Kung University
 
Shorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation SystemsShorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation SystemsNational Cheng Kung University
 

Plus de National Cheng Kung University (20)

PyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtimePyPy's approach to construct domain-specific language runtime
PyPy's approach to construct domain-specific language runtime
 
2016 年春季嵌入式作業系統課程說明
2016 年春季嵌入式作業系統課程說明2016 年春季嵌入式作業系統課程說明
2016 年春季嵌入式作業系統課程說明
 
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
進階嵌入式作業系統設計與實做 (2015 年秋季 ) 課程說明
 
Construct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoTConstruct an Efficient and Secure Microkernel for IoT
Construct an Efficient and Secure Microkernel for IoT
 
從線上售票看作業系統設計議題
從線上售票看作業系統設計議題從線上售票看作業系統設計議題
從線上售票看作業系統設計議題
 
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
進階嵌入式系統開發與實做 (2014 年秋季 ) 課程說明
 
Xvisor: embedded and lightweight hypervisor
Xvisor: embedded and lightweight hypervisorXvisor: embedded and lightweight hypervisor
Xvisor: embedded and lightweight hypervisor
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Implement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVMImplement Runtime Environments for HSA using LLVM
Implement Runtime Environments for HSA using LLVM
 
Priority Inversion on Mars
Priority Inversion on MarsPriority Inversion on Mars
Priority Inversion on Mars
 
Develop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM BoardsDevelop Your Own Operating Systems using Cheap ARM Boards
Develop Your Own Operating Systems using Cheap ARM Boards
 
Lecture notice about Embedded Operating System Design and Implementation
Lecture notice about Embedded Operating System Design and ImplementationLecture notice about Embedded Operating System Design and Implementation
Lecture notice about Embedded Operating System Design and Implementation
 
Explore Android Internals
Explore Android InternalsExplore Android Internals
Explore Android Internals
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Open Source from Legend, Business, to Ecosystem
Open Source from Legend, Business, to EcosystemOpen Source from Legend, Business, to Ecosystem
Open Source from Legend, Business, to Ecosystem
 
Summer Project: Microkernel (2013)
Summer Project: Microkernel (2013)Summer Project: Microkernel (2013)
Summer Project: Microkernel (2013)
 
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
進階嵌入式系統開發與實作 (2013 秋季班 ) 課程說明
 
Faults inside System Software
Faults inside System SoftwareFaults inside System Software
Faults inside System Software
 
Hints for L4 Microkernel
Hints for L4 MicrokernelHints for L4 Microkernel
Hints for L4 Microkernel
 
Shorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation SystemsShorten Device Boot Time for Automotive IVI and Navigation Systems
Shorten Device Boot Time for Automotive IVI and Navigation Systems
 

Dernier

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Dernier (20)

Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

淺談探索 Linux 系統設計之道

  • 1. 自己動手,豐衣足食 淺談探索 Linux 系統 設計之道 Jim Huang( 黃敬群 ) “jserv” blog: http://blog.linux.org.tw/jserv/ From 0xlab - http://0xlab.org/ Study-Area / June 5, 2010
  • 2. 授權聲明 Attribution – ShareAlike 2.0 You are free to copy, distribute, display, and perform the © Copyright 2010 work Jim Huang <jserv.tw@gmail.com> to make derivative works Partial © Copyright 2004-2005 to make commercial use of the work Michael Opdenacker Under the following conditions <michael@free-electrons.com> Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one. For any reuse or distribution, you must make clear to others the license terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above. License text: http://creativecommons.org/licenses/by-sa/2.0/legalcode
  • 3. 參考組態 Ubuntu Linux 10.04 / 10.10 Kernel: 2.6.32-15-generic gcc: 4.4.4 glibc: 2.11.1 Lenovo ThinkPad X200 Intel Core2 Duo CPU 2.4 GHz
  • 4. Agenda Linux 核心設計概念 Rosetta Stone : Linux 如何建立軟硬體關聯 Linux 驅動程式架構與發展 尋幽訪勝自己來
  • 6. [ 概念 I] 以 C 語言建構
  • 7. C 語言精髓: Pointer Pointer 也就是記憶體 (Memory) 存取的替身 重心: Linux → Pointer → Memory
  • 8. Linux 跟其他 UNIX 一樣,採用 C 作為主要開發語言,硬體相關部 份則透過組合語言 不採用 C++ 的緣故可見 http://www.tux.org/lkml/#s15-3 主要考量點:效率
  • 9. Linux kernel size Size of Linux 2.6.32 source directories (KB) drivers arch fs Linux 2.6.32 report by SLOCCount (http://dwheeler.com/sloccount/) sound net Totals grouped by language include Documentation firmware ansic:      7680247 (96.91%) kernel asm:         225175 (2.84%) mm perl:          9015 (0.11%) scripts sh:            3404 (0.04%) crypto security yacc:          2964 (0.04%) lib lex:           1824 (0.02%) tools cpp:           1795 (0.02%) block python:         493 (0.01%) ipc virt awk:            109 (0.00%) init sed:             30 (0.00%) samples usr Total lines of code: 7925056 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000
  • 10. 為了效率,充斥了奇計淫 ^H^H^H 技巧 Linux 核心依賴 GNU gcc 特有擴充 對編譯器作大量提示,如 likely/unlikely 擴充 if (unlikely(err)) { ... } 平台特有優化
  • 12. 核心架構 App1 App2 ... User space C library System call interface Process Memory Filesystem Device Networking management management support control Kernel space Filesystem types CPU support CPU / MMU Storage Character Network code  support code drivers device drivers device drivers Hardware CPU RAM Storage
  • 13. “From a technical standpoint, I believe the kernel will be "more of the same" and that all the _really_ interesting staff will be going out in user space.” -- Linus Torvalds October 2001
  • 14. [ 概念 III] Linux 核心充斥大量的物件 導向設計 kobject: ADT (Abstract Data Typing) device driver: Inheritance/polymorphism hotplug(), probe(): dynamic binding
  • 15. Buses : 處理器與一個或多個裝置間的通道 Devices 與對應的 device drivers struct device device struct ldd_device Driver kobject bus core core struct driver device struct ldd_driver driver struct Functional view inside kernel kobject
  • 16. Kobject 的繼承關係 parent pointer 與 ksets “parent” points to another kobject, representing the next level up “kset” is a collection of kobjects kset are always represented in sysfs Every kobject that is a member of a kset is represented in sysfs
  • 17.
  • 19. Everything is file.
  • 20. 核心與週邊 App1 App2 ... User space C library System call interface Process Memory Filesystem Device Networking management management support control Kernel Filesystem space types CPU support CPU / MMU Storage Character Network code  support code drivers device drivers device drivers Hardware CPU RAM Storage • UNIX 法則 : “Everything is file” – 實體記憶體 (/dev/mem) 、網路 (socket) 、實體 / 虛擬裝置、系 統資訊、驅動程式的控制、週邊組態、 ... • Linux Device Driver 的角色即是將 file operation 映射到 Device – 有明確階層概念 – 不限於 kernel-space driver – 經典的 user-space driver 如 X11 video driver
  • 21.
  • 22. VFS( 虛擬檔案系統 ) 扮演的角色
  • 23. 掛載 VFS Mounting /proc: sudo mount -t proc none /proc Mounting /sys: sudo mount -t sysfs none /sys Filesystem type Raw device Mount point or filesystem image In the case of virtual filesystems, any string is fine /proc/cpuinfo: processor information /proc/meminfo: memory status # man proc /proc/version: kernel version and build information /proc/cmdline: kernel command line
  • 25. debugfs 用以揭露核心資訊的虛擬檔案系統 核心組態: DEBUG_FS Kernel hacking -> Debug Filesystem 掛載方式: sudo mount -t debugfs none /debug http://lwn.net/Articles/334068/ 搭配 Ftrace (CONFIG_FUNCTION_TRACER) # cat /debug/tracing/available_tracers blk function_graph function sched_switch nop # echo function > /debug/tracing/current_tracer # echo 1 >/debug/tracing/tracing_on ( 進行某些動作 ) # echo 0 >/debug/tracing/tracing_on
  • 26. Ftrace # head /debug/tracing/trace # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | Xorg-1006 [000] 3055.314290: _cond_resched <-copy_from_user Xorg-1006 [000] 3055.314291: i915_gem_sw_finish_ioctl <-drm_ioctl Xorg-1006 [000] 3055.314291: mutex_lock <-i915_gem_sw_finish_ioctl Xorg-1006 [000] 3055.314291: _cond_resched <-mutex_lock Xorg-1006 [000] 3055.314291: drm_gem_object_lookup <-i915_gem_sw_finish_ioctl Xorg-1006 [000] 3055.314291: _spin_lock <-drm_gem_object_lookup
  • 27. Memory is file as well. /dev/mem virtual memory
  • 29. 先回顧個人電腦的 I/O 架構 x86 提供一組指令存取 IO port PMIOand outs instructions vs MMIO (Memory-Mapped I/O) in, out, ins, (Port-Mapped I/O) ISA vs. PCI
  • 31. I/O 的型態 Programmed I/O (PIO) port-mapped I/O memory-mapped I/O Interrupt-driven I/O ( 相對於 polling) Direct Memory Access (DMA) Channel I/O (I/O processor)
  • 32. User process SIGSEGV, kill 嘗試存取 Linux 實體 / 硬體位址 (Addressing) Kernel 0xFFFFFFFFF Exception 非法 / 不允許的 (MMU) I/O memory 3 記憶體區域 I/O memory 2 Memory Management I/O memory 1 Unit Proc1 Proc2 Flash MMU CPU Process Memory management management CPU support CPU / MMU RAM 1 code  support code RAM 0 CPU RAM 0x00000000
  • 33. Physical / Virtual memory Physical address space Virtual address spaces 0xFFFFFFFFF 0xFFFFFFFFF 0xFFFFFFFFF Kernel I/O memory 3 0xC0000000 Process1 I/O memory 2 Memory Management Unit 0x00000000 I/O memory 1 0x00000000 Flash MMU CPU 0xFFFFFFFFF Kernel RAM 1 0xC0000000 RAM 0 Process2 0x00000000 0x00000000 所有的 process 都有獨自 的 virtual address space , 認為可存取整個空間
  • 34. Everything is file. * 需將裝置對應到 Virtual Memory 才 能使用
  • 35. mmap mmap 系統呼叫 Device driver Process ( 註冊的 ) 檔案操作 mmap 被呼叫 初始化硬體 - 記憶體的對應 存取 virtual address access MMU physical address Process virtual address space Physical address space
  • 36. 以 X server 為例 # pmap `pidof X` | grep dev | head 00590000     40K r­x­­  /lib/libudev.so.0.6.1 0059a000      4K r­­­­  /lib/libudev.so.0.6.1 0059b000      4K rw­­­  /lib/libudev.so.0.6.1 00b02000     36K r­x­­  /usr/lib/xorg/modules/input/evdev_drv.so 00b0b000      4K r­­­­  /usr/lib/xorg/modules/input/evdev_drv.so 00b0c000      4K rw­­­  /usr/lib/xorg/modules/input/evdev_drv.so a75e8000    128K rw­s­  /dev/dri/card0 a7608000    128K rw­s­  /dev/dri/card0 a7628000    128K rw­s­  /dev/dri/card0 a7648000    128K rw­s­  /dev/dri/card0 start address size Mapped file name 可對照 /proc/`pidof X`/maps 的內容
  • 37. mmap :: user-space • 取得裝置的 fd (file descriptor) 後 ... mmap 系統呼叫 void * mmap( void *start, /* Often 0, preferred starting address */ size_t length, /* Length of the mapped area */ int prot , /* Permissions: read, write, execute */ int flags, /* Options: shared mapping, private copy */ int fd, /* Open file descriptor */ off_t offset /* Offset in the file */ ); mmap :: kernel-space int (*mmap) ( struct file *, /* Open file structure */ struct vm_area_struct * /* Kernel VMA structure */ );
  • 38. 實例: devmem 直接 peek (read) 或 poke (write) 某一段已對應的實 體位址: (b: byte, h: half, w: word) devmem 0x000c0004 h (reading) devmem 0x000c0008 w 0xffffffff (writing) if ((fd = open("/dev/mem", O_RDWR | O_SYNC)) == -1) FATAL; # devmem 0x000c0004 h Memory mapped at address 0xb7fb5000. Value at address 0xC0004 (0xb7fb5004): 0xE3A9 /* Map one page */ map_base = mmap(0, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, target & ~MAP_MASK); if (map_base == (void *) -1) FATAL; virt_addr = map_base + (target & MAP_MASK);
  • 42. • 1799 年拿破崙遠征埃及時期,法軍上尉 Pierre-François Xavier Bouchard 在尼羅河口港灣城市羅塞塔 (Rosetta ,今日稱為 el- Rashid) 發現此碑石,自此揭開古埃及象形文字之謎 • 羅塞塔碑石製作於公元前 196 年,原是一塊刻有埃及國王托勒 密五世召書的碑石,但由於這塊碑石同時刻有同一段文字的三種 不同語言版本,使得近代的考古學家得以在對照各語言版本的內 容後,解讀出已經失傳千餘年的埃及象形文之意義與結構,而成 為今日研究古埃及歷史的重要里程碑
  • 43. • 由於破解埃及象形文這種如謎題 般事物之起點, Rosetta Stone 被比喻為解決難題或謎題的關鍵 線索或工具 # cat /proc/iomem
  • 44. # cat /proc/iomem 00000000-0009efff : System RAM 0009f000-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000c8000-000cbfff : pnp 00:00 000cf000-000cffff : Adapter ROM 000d0000-000d0fff : Adapter ROM 000d2000-000d3fff : reserved 000dc000-000dffff : pnp 00:00 000e0000-000e3fff : pnp 00:00 000e4000-000e7fff : pnp 00:00 000e8000-000ebfff : pnp 00:00 實體 / 硬體位址 (Addressing) 000f0000-000fffff : System ROM 00100000-7f6cffff : System RAM 0xFFFFFFFFF 00100000-0031b5a3 : Kernel code 0031b5a4-00414dc3 : Kernel data I/O memory 3 00476000-004eba7f : Kernel bss 7f6d0000-7f6defff : ACPI Tables 7f6df000-7f6fffff : ACPI Non-volatile Storage I/O memory 2 Memory 7f700000-7fffffff : reserved Management 88000000-8bffffff : PCI CardBus #16 d0000000-dfffffff : 0000:00:02.0 I/O memory 1 Unit ... e4300000-e7ffffff : PCI Bus #15 e4300000-e4300fff : 0000:15:00.0 e4300000-e4300fff : yenta_socket Flash MMU e4301000-e43017ff : 0000:15:00.1 e4301000-e43017ff : ohci1394 e4301800-e43018ff : 0000:15:00.2 e4301800-e43018ff : sdhci:slot0 RAM 1 e8000000-e9ffffff : PCI Bus #04 ea000000-ebffffff : PCI Bus #0c ec000000-edffffff : PCI Bus #03 RAM 0 edf00000-edf00fff : 0000:03:00.0 edf00000-edf00fff : iwl3945 ... 0x00000000
  • 45.
  • 49. PMIO (Port-Mapped I/O) vs MMIO (Memory-Mapped I/O) $ cat /proc/ioports $ cat /proc/iomem 0000-001f : dma1 00000000-0009fbff : System RAM 0020-0021 : pic1 00000000-00000000 : Crash kernel 0040-0043 : timer0 0009fc00-0009ffff : reserved 0050-0053 : timer1 000a0000-000bffff : Video RAM area 0060-006f : keyboard 0070-0077 : rtc 000c0000-000c7fff : Video ROM 0080-008f : dma page reg 000cc000-000d97ff : Adapter ROM 00a0-00a1 : pic2 000f0000-000fffff : System ROM 00c0-00df : dma2 00100000-0ffeffff : System RAM 00f0-00ff : fpu 0170-0177 : ide1 00100000-00281514 : Kernel code 01f0-01f7 : ide0 00281515-003117b3 : Kernel data 02f8-02ff : serial 20000000-200fffff : PCI Bus #01 0376-0376 : ide1 [...] [...]
  • 51. Requesting I/O ports /proc/ioports struct resource *request_region( 0000-001f : dma1 unsigned long start, 0020-0021 : pic1 unsigned long len, 0040-0043 : timer0 0050-0053 : timer1 char *name); 0060-006f : keyboard 0070-0077 : rtc 0080-008f 00a0-00a1 : : dma page reg pic2 嘗試保留給定區域 00c0-00df : dma2 00f0-00ff : fpu 0100-013f : pcmcia_socket0 request_region(0x0170, 8, "ide1"); 0170-0177 : ide1 01f0-01f7 : ide0 void release_region( 0376-0376 : ide1 0378-037a : parport0 unsigned long start, 03c0-03df : vga+ unsigned long len); 03f6-03f6 : ide0 參考 include/linux/ioport.h 與 03f8-03ff : serial 0800-087f : 0000:00:1f.0 0800-0803 : PM1a_EVT_BLK 0804-0805 : PM1a_CNT_BLK kernel/resource.c 0808-080b : PM_TMR 0820-0820 : PM2_CNT_BLK 0828-082f : GPE0_BLK ...
  • 52. Requesting I/O memory /proc/iomem 介面一致 00000000-0009efff : System RAM 0009f000-0009ffff : reserved struct resource * request_mem_region( 000a0000-000bffff : Video RAM area unsigned long start, 000c0000-000cffff : Video ROM unsigned long len, 000f0000-000fffff : System ROM 00100000-3ffadfff : System RAM char *name); 00100000-0030afff : Kernel code 0030b000-003b4bff : Kernel data void release_mem_region( 3ffae000-3fffffff : reserved unsigned long start, 40000000-400003ff : 0000:00:1f.1 unsigned long len); 40001000-40001fff : 0000:02:01.0 40001000-40001fff : yenta_socket 40002000-40002fff : 0000:02:01.1 40002000-40002fff : yenta_socket 40400000-407fffff : PCI CardBus #03 40800000-40bfffff : PCI CardBus #03 40c00000-40ffffff : PCI CardBus #07 41000000-413fffff : PCI CardBus #07 a0000000-a0000fff : pcmcia_socket0 a0001000-a0001fff : pcmcia_socket1 e0000000-e7ffffff : 0000:00:00.0 e8000000-efffffff : PCI Bus #01 e8000000-efffffff : 0000:01:00.0 大部分驅動程式的工作大概就在讀寫 ... I/O port 或 I/O memory( 統稱 I/O region)
  • 53. io_mem.c(1) #include <linux/init.h> #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/module.h> #include <linux/fs.h> #include <linux/ioport.h> static int Major, result; static struct file_operations fops; MODULE_LICENSE("GPL"); unsigned long start = 1, length = 1; module_param(start, long, 1); module_param(length, long, 1); ... module_init (memIO_init); module_exit (memIO_cleanup);
  • 54. io_mem.c(2) int memIO_init (void) { Major = register_chrdev (0, "memIO_device", &fops); if (Major < 0) { printk (" Major number allocation is failed n"); return (Major); } printk (" The Major number of the device is %d n", Major); result = check_mem_region (start, length); if (result < 0) { printk ("Allocation for I/O memory range is failed: “ ”Try other rangen"); return (result); } request_mem_region (start, length, "memIO_device"); return 0; }
  • 55. io_mem.c(3) void memIO_cleanup (void) { release_mem_region (start, length); printk (" The I/O memory region is released “ ”successfully n"); unregister_chrdev (Major, "memIO_device"); printk (" The Major number is released successfully n"); }
  • 56. 在發送 Request 之前 # cat /proc/devices # cat /proc/iomem Character devices: 00000000-0009efff : System RAM 1 mem 0009f000-0009ffff : reserved 2 pty 000a0000-000bffff : Video RAM area 3 ttyp 000c0000-000c7fff : Video ROM 4 /dev/vc/0 000c8000-000cbfff : pnp 00:00 4 tty 000cf000-000cffff : Adapter ROM 4 ttyS 000d0000-000d0fff : Adapter ROM 5 /dev/tty 000d2000-000d3fff : reserved 5 /dev/console ... 5 /dev/ptmx ee444000-ee4443ff : 0000:00:1d.7 7 vcs ee444000-ee4443ff : ehci_hcd 10 misc ee444400-ee4447ff : 0000:00:1f.2 13 input ee444400-ee4447ff : ahci 14 sound ... 21 sg 29 fb 108 ppp 116 alsa 128 ptm # insmod io_mem.ko start=0xeeee0000 136 pts 171 ieee1394 length=1 180 usb 189 usb_device 226 drm ...
  • 57. 發送 Request 之後 # cat /proc/devices # cat /proc/iomem Character devices: 00000000-0009efff : System RAM 1 mem 0009f000-0009ffff : reserved 2 pty 000a0000-000bffff : Video RAM area 3 ttyp 000c0000-000c7fff : Video ROM 4 /dev/vc/0 000c8000-000cbfff : pnp 00:00 4 tty 000cf000-000cffff : Adapter ROM 4 ttyS 000d0000-000d0fff : Adapter ROM 5 /dev/tty 000d2000-000d3fff : reserved 5 /dev/console ... 5 /dev/ptmx ee444000-ee4443ff : 0000:00:1d.7 7 vcs ee444000-ee4443ff : ehci_hcd 10 misc ee444400-ee4447ff : 0000:00:1f.2 13 input ee444400-ee4447ff : ahci 14 sound eeee0000-eeee0000 : memIO_device 21 sg ... 29 fb 108 ppp 116 alsa 128 ptm # dmesg 136 pts ... 171 ieee1394 [11682.040057] The Major number of the device is 251 180 usb 189 usb_device 226 drm 252 memIO_device ...
  • 58. Polling copy_from_user(buffer, p, count); /* p is the kernel buffer */ for (i = 0; i < count; i++) { /* loop on every character */ while (*printer_status_reg != READY) /* loop until ready */ ; printer_data_register = p[i]; /* output one character */ } return_to_user();
  • 60. IRQ lines $ cat /proc/interrupts CPU0 0: 111151372 IO-APIC-edge timer 1: 8 IO-APIC-edge i8042 6: 2 IO-APIC-edge floppy 7: 2 IO-APIC-edge parport0 8: 4 IO-APIC-edge rtc 9: 1 IO-APIC-level acpi 12: 114 IO-APIC-edge i8042 14: 3277682 IO-APIC-edge ide0 15: 65 IO-APIC-edge ide1 169: 14445090 IO-APIC-level eth0 NMI: 0 LOC: 111150116 ERR: 0 MIS: 0
  • 61. Linux 驅動程式 架構與發展
  • 62. Linux 驅動程式架構 User-Space Application System Call Interface Virture File System(VFS) BSD socket Network inet(AF_INET) Kernel-Space Buffer Cache Subsystem Transport(TCP,UDP) Network(IP) Character Block Network Device Driver Device Driver Device Driver Device Interface Hardware Physical Device (Hardware)
  • 64. Device File 類型 Character Block Network USB, Firewire, Device Driver Device Driver Device Driver SCSI,... crw--w--w- crw--w--w- 0 0 root root root root 5, 5, 1 1 Oct 1 Oct 1 1998 1998 console console crw-rw-rw- crw-rw-rw- 1 1 root root root root 1, 1, 3 3 May 6 May 6 1998 1998 null null crw------- crw------- 1 1 root root root root 4, 4, 0 0 May 6 May 6 1998 1998 tty tty crw-rw---- crw-rw---- 1 1 root root disk disk 96, 96, 0 0 Dec 10 Dec 10 1998 1998 pt0 pt0 crw------- crw------- 1 1 root root root root 5, 5, 64 64 May 6 May 6 1998 1998 cua0 cua0 brw------- brw------- 1 1 root root floppy floppy 2, 2, 0 0 May May 6 6 1998 1998 fd0 fd0 brw-rw---- brw-rw---- 1 1 root root disk disk 3, 3, 0 0 May May 6 6 1998 1998 hda hda brw-rw---- brw-rw---- 1 1 root root disk disk 3, 3, 1 1 May May 6 6 1998 1998 hda1 hda1 brw-rw---- brw-rw---- 1 1 root root disk disk 8, 8, 0 0 May May 6 6 1998 1998 sda sda brw-rw---- brw-rw---- 1 1 root root disk disk 8, 8, 1 1 May May 6 6 1998 1998 sda1 sda1
  • 65. Major Number 與 Minor Number • /dev 目錄下的檔案稱為 device files ,用以辨識 / 對應於裝置 User­space • Kernel 透過 major number 來 指定正確的驅動程式給 device Read buffer Write string files read write • Minor number 只有驅動程式本 身會使用到 /dev/foo Copy from user Copy to user major / minor Read Write handler handler Device driver Kernel space
  • 66. 註冊的 device driver 顯示於 /proc/devices: Character devices: Block devices: 1 mem 1 ramdisk 4 /dev/vc/0 3 ide0 4 tty 8 sd 4 ttyS 9 md 5 /dev/tty 22 ide1 5 /dev/console 65 sd 5 /dev/ptmx 66 sd 6 lp 67 sd 10 misc 68 sd 13 input 14 sound ... Major Registered number name
  • 67. Linux Device Driver 資料結構 struct file_operations { /* character device drivers */ struct module *owner; loff_t (*llseek) (struct file *, loff_t, int); ssize_t (*read) (struct file *, char *, size_t, loff_t *); ssize_t (*write) (struct file *, const char *, size_t, loff_t *); int (*readdir) (struct file *, void *, filldir_t); unsigned int (*poll) (struct file *, struct poll_table_struct *); int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); int (*mmap) (struct file *, struct vm_area_struct *); int (*open) (struct inode *, struct file *); int (*flush) (struct file *); int (*release) (struct inode *, struct file *); int (*fsync) (struct file *, struct dentry *, int datasync); int (*fasync) (int, struct file *, int); int (*lock) (struct file *, int, struct file_lock *); ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *); ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int); unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); };
  • 68. 檔案操作的內部 get PAL video get NTSC video Process 1 Process 2 ioctl: change read read read fop Open file 1 Open file 2 open open 實做 file_operations structure 兩個不同 process 可自同個 #include <linux/fs.h> /dev/video device file 讀取不同的資料 static struct file_operations my_fops = { .owner = THIS_MODULE, .read = moko_read, .write = moko_write, }; 將上述 my_read/my_write 指向為具體實做 否則採用預設 function( 如 open, release...)
  • 69. read() 操作內部 The read methods copies data to application code. ssize_t read(struct file *filp, char *buff, size_t count, loff_t *offp);
  • 70. Device number 配置方式 #include <linux/fs.h> #include <linux/fs.h> int register_chrdev_region( int alloc_chrdev_region( dev_t from, /* Starting device number */ dev_t *dev, /* Output: device number */ unsigned count,/* Number of device numbers */ unsigned baseminor, const char *name);/* Registered name */ /* Starting minor number, usually 0 */ Returns 0 if the allocation was successful. unsigned count, Example /* Number of device numbers */ if (register_chrdev_region( const char *name /* Registered name */ MKDEV(202, 128), ); moko_count, “moko”)) { Returns 0 if the allocation was printk(KERN_ERR “Failed to allocate “ successful. ”device numbern”); Example ... if (alloc_chrdev_region(&moko_dev, 0, moko_count, “moko”)) { Allocating fixed printk(KERN_ERR “Failed to allocate device numbern”); device numbers ... Dynamic allocation of device numbers Safer: have the kernel allocate free numbers for you!
  • 71. 要點 • 當 LKM 即將從系統釋放, major number 必須交 還系統 • 檔案操作與結構 Device ID By file structure 核心使用 file ops 以存取 driver 提供的函式  open / close 初始化裝置並調整 usage count  Memory Device-memory Link List  Read / Write Transfer data from Kernel to User
  • 72. 進階 I/O 控制  Blocking / Non-blocking I/O • Polling: poll and select function • Asynchronous notification • Access control of device file
  • 73. 考量點 • Synchronization / Avoid race condition – semaphore / mutex / kernel lock • Reentrance • Preemption in kernel code • Interrupt timer / peripheral • Take care of bandwidth issues • DMA vs. PIO (Programming I/O) model • SMP
  • 74. HW Concurrency (1) INT IRQ I/O I/O CPU device APIC INT ACK I/O APIC 對裝置不斷作 poll 並發出 interrupt 在 CPU 回應 (ACK/acknowledge) 前,沒有新的 interrupt 可被發出 通常核心得在多個 interrupt 產生的情況下執行
  • 75. HW Concurrency (2) Symmetrical MultiProcessor (SMP) 顧名思 義,擁有兩個以上的 CPU SMP kernel 必須在有效的 CPU 上同步執行 情境:在其中一顆 CPU 上執行了網路相關 的 service routine ,而另一個 CPU 則執行 檔案系統相關的動作
  • 76. # cat /proc/interrupts CPU0 CPU1 0: 4988074 5094948 IO-APIC-edge timer ... 29: 7 11515 PCI-MSI-edge eth0 30: 290411 379267 PCI-MSI-edge i915 31: 651550 496152 PCI-MSI-edge iwlagn NMI: 0 0 Non-maskable interrupts ... TRM: 0 2 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 77 77 Machine check polls ERR: 1 MIS: 0
  • 77. interrupt 總量 # cat /proc/stat | grep intr intr 8190767 6092967 10377 0 1102775 5 2 0 196 ... Total number IRQ1 IRQ2 IRQ3 of interrupts total total ...
  • 78. Linux 2.6+ 新的 Driver Model
  • 79. Device Model 特徵 (1) 從簡化能源管理處理出發,現在已大幅改進 充分表現硬體系統架構 提供 user-space 層級的表示途徑: sysfs 比 /proc 更全面 更一致、容易維護的 device interface: include/linux/device.h
  • 80. Device model 特徵 (2) 允許從多角度檢視系統 從系統上已存在的裝置來看: power state 、連 結上的 bus ,與對應的 driver 從 system bus 結構來看: bus 與 bus 之間的連 結方式 ( 如 PCI bus 上的 USB bus controller) 、 對應的 driver 從裝置本身來看: input, net, sound... 得以簡便地找到裝置,而不需顧慮實體連結方式
  • 81. “From a technical standpoint, I believe the kernel will be "more of the same" and that all the _really_ interesting staff will be going out in user space.” -- Linux Torvalds October 2001
  • 82. sysfs Device Model 的 user-space 表現方式 Kernel 配置編譯選項 CONFIG_SYSFS=y (Filesystems -> Pseudo filesystems) 掛載方式: sudo mount -t sysfs none /sys 探索 /sys 是理解硬體設計便利的途徑
  • 83. sysfs tools http://linux-diag.sourceforge.net/Sysfsutils.html libsysfs - 提供一致且穩定的介面,得以查閱 由 sysfs 公開的系統硬體資訊 systool – 使用 libsysfs 的應用程式,可列 印裝置的 bus, class, topology
  • 86. 原本 /dev 暴露的問題 Red Hat 9 Linux 上,為了支援可能的應用, 必須在 /dev 擺放 18000 個 device files 需要對 major number 作授權與協調分配 http://lanana.org/: Linux Assigned Names and Numbers Authority 對於 user-space 應用程式來說,既不知曉在 系統上的裝置,也不知曉對應 /dev 上的 device files
  • 87. udev 解決方案 善用 sysfs 機制 全部在 user-space 依據實際上硬體新增與移除的裝況,自動建立與 刪除 /dev/ 底下的 device file Major and minor device transmitted by the kernel. Requires no change to driver code. Fast: written in C Small size: udevd version 117: 67 KB in Ubuntu 8.04
  • 88. udev 典型操作 Kernel driver core uevent udevd (usb, pci...) udev event process Matches event to rules Creates / removes device files /lib/udev/ programs or others Load the right module Notify userspace programs (GUI...)
  • 89. 識別 device driver modules Kernel / module compiling System everyday life Each driver announces which device and vendor The driver core (usb, pci...) reads the device id, ids it supports. Information stored in module files. vendor id and other device attributes. The depmod ­a command processes The kernel sends an event to udevd, setting the  module files and generates MODALIAS environment variable, encoding these data. /lib/modules/<version>/modules.alias A udev event process runs modprobe $MODALIAS modprobe finds the module to load in the modules.alias file.
  • 90. Firmware hotplugging 實做 Kernel space Userspace Driver /sys/class/firmware/xxx/{loading,data} calls request_firmware() appear Sleeps firmware subsystem event sent to udev Calling /lib/udev/firmware_helper Kernel Get ready to load firmware data /lib/udev/firmware_helper Grows a buffer to accommodate incoming data echo 1 > /sys/class/firmware/xxx/loading cat fw_image > /sys/class/firmware/xxx/data echo 0 > /sys/class/firmware/xxx/loading Driver  wakes up after request_firmware() Copies the buffer to the hardware Calls release_firmware() See Documentation/firmware_class/ for a nice overview
  • 91. 參考資訊 • KernelTrap http://kerneltrap.org/ – Forum website for kernel developers – News, articles, whitepapers, discussions, polls, interviews • Linux Device Drivers, 3rd edition, Feb 2005 – Jonathan Corbet, Alessandro Rubini, Greg Kroah-Hartman, O'Reilly • Linux Kernel in a Nutshell, Dec 2006 Greg Kroah-Hartman, O'Reilly