2. Objective
• Learn how system calls work.
• Different privilege levels.
• Memory Manager concepts.
• Interrupt Request Levels.
• Asynchronous Procedure Calls (APC) and
Deferred Procedure Calls (DPC).
3.
4. Lord of the Rings
• x86 processor has 4 layers of protection
called Ring 0 – 3.
• Privilege code (Kernel ) runs in Ring 0.
Processor ensure that privilege
instructions (like enable/disable interrupt, )
execute in kernel mode only.
• User application runs in Ring 3.
• Ring 1 is where the Hyperviser lives..
6. How system call works
• Cannot directly enter kernel space using jmp or a call instruction.
• When make a system call (like CreateFile, ReadFile) OS enter
kernel mode (Ring 0) using instruction int 2E (it is called interrupt
gate).
• Code segment descriptor contain information about the ‘Ring’ at
which the code can run. For kernel mode modules it will be always
Ring 0. If a user mode program try to do ‘jmp <kernel mode
address>’ it will cause access violation, because of the segment
descriptor flag says processor should be in Ring 0.
• The frequency of entering kernel mode is high (most of the Windows
API call cause to enter kernel mode) sysenter is the new optimized
instruction to enter kernel mode.
7. System Call continued..
• Windows maintains a system service dispatch table
which is similar to the IDT. Each entry in system service
table point to kernel mode system call routine.
• The int 2E probe and copy parameters from user mode
stack to thread’s kernel mode stack and fetch and
execute the correct system call procedure from the
system service table.
• There are multiple system service tables. One table for
NT Native APIs, one table for IIS and GDI etc.
10. Lets try it in WinDBG..
• NtWriteFile:
mov eax, 0x0E ; build 2195 system service number for NtWriteFile
mov ebx, esp ; point to parameters
int 0x2E ; execute system service trap
ret 0x2C ; pop parameters off stack and return to caller
11. IO Request Packet (IRP)
• When a thread initiate an IO operation, IO
Manager create a data structure call IO Request
Packet (IRP).
• The IRP contains all information about the
request.
• IO Manager send the IRP to the top device in
the driver stack.
• Demo : !irpfind to see all current IRPs.
Demo : !irp <irp address> to see information
about one IRP.
12. Memory Manager
• x86 Windows box support total 4GB of virtual memory
• Lower 2GB (from x00000000 to x7FFFFFFF) for process
private storage.
• Upper 2GB (x80000000 – xFFFFFFFF) for OS memory
requirements.
• Upper 2GB is common for all process, in other words
half of PDE in is same for all process.
• Windows usually map the system call parameters to
kernel mode memory so that it can access from any
process context.
• Interrupts and DPC (will talk about it later) can occur in
arbitrary thread context, but still it can access the buffer
because it is mapped to kernel.
13.
14. Memory Manager continued..
• Kernel mode there two types of memory.
• Paged Pool and NonPagedPool
• NonPagedPool pages will be always on
memory.
• PagedPool pages can swap to page file
according to the memory requirements.
• Driver writers should use NonPagedPool
judiciously.
15. Memory Manager continued..
• ExAllocatePool(), ExAllocatePoolWithTag() are
the DDK APIs in kernel mode to allocate
memory.
• We can put tag to the memory allocation so that
it is easy to monitor the pool usage.
• Memory manager keep the pool tag in the
beginning of the allocation (Demo: use WinDBG
to check it).
• Demo : !poolused command to see the pool
tags.
• Demo: use poolmon.exe to see the pool tags.
16. Software Interrupt Request
Levels (IRQLs)
• Windows has its own interrupt priority schemes know as
IRQL.
• IRQL levels from 0 to 31, the higher the number means
higher priority interrupt level.
• HAL map hardware interrupts to IRQL 3 (Device 1) to
IRQL 31 (High)
• When higher priority interrupt occur, it mask the all lower
interrupts and execute the ISR for the higher interrupt.
• After executing the ISR, kernel lower the interrupt levels
and execute the lower interrupt ISR.
• ISR routine should do minimal work and it should defer
the major chunk of work to Deferred Procedure Call
(DPC) which run at lower IRQL 2.
18. IRQL and DPC
• DPC concept is similar to other OS, in
Linux it is called bottom half.
• DPC is per processor, means a duel
processor SMP box contains two DPC Qs.
• The ISR routine generally fetch data from
hardware and queue a DPC for further
processing.
• IRQL priority is different from thread
scheduling priority.
19. IRQL and DPC
• The scheduler (dispatcher) also runs at IRQL 2.
• So a code that execute on or above IRQL
2(dispatch level) cannot preempt.
• From the Diagram, see only hardware interrupts
and some higher priority interrupts like clock,
power fail are above IRQL 2.
• Most of the time OS will be in IRQL 0(Passive
level)
• All user programs and most of the kernel code
execute on Passive level only.
20. IRQL continued..
• Scheduler runs at IRQL 2, so what happen if my driver try to wait on
or above dispatch level ?.
• Simple system will crash with ‘Blue Screen’, usually with the bug
check ID IRQL_NOT_LESSTHAN_EQUAL.
• Because if wait above dispatch level, no one there to come and
switch the thread.
• What happen if try to access a PagedPool in above dispatch level ?.
• If the pages are on disk, then a page fault exception will happen, the
current thread need to wait and page fault handler will read the
pages from page file to page frames in memory.
• If page fault happen above the dispatch level, no one there to stop
the current thread and schedule the page fault handler. Thus cannot
access PagedPool on or above dispatch level.
21. IRQL 1 - APCs
• Asynchronous Procedure Call (APC) run at IRQL 1.
• The main duty of APC is to send the data to user thread
context.
• APC Q is thread specific, each thread has its own APC
Q.
• User space thread initiate the read operation from a
device and either it wait to finish it or continue with
another job.
• The IO may finish sometime later, now the buffer need
to send to the calling thread’s process context. It is the
duty of APC.