Low-Level Memory Allocation and Demand Paging in Kernel

2025-09-24 • ~12 min read

Kernel memory allocation uses virtual mapping initially; physical allocation defers to page faults via demand paging. Buddy allocator handles physical pages on fault. Enables overcommitment and efficiency.

Virtual vs Physical Allocation

User-space malloc (via brk/mmap) allocates virtual address space. Kernel maps pages in PTEs as invalid or reserved. No physical RAM until access triggers #PF (page fault).

// User malloc
void* ptr = malloc(4096);  // Virtual alloc, no physical yet
*ptr = 42;                 // Triggers page fault, then physical alloc

Demand Paging Mechanism

On access, CPU raises #PF (interrupt 14). Kernel handler (e.g., do_page_fault) checks VMA (vm_area_struct), allocates physical page if valid, updates PTE. Lazy allocation saves resources.

; Page fault handler (assembly snippet)
page_fault:
    push %rax  ; Save regs
    mov %cr2, %rdi  ; Fault address in CR2
    call do_page_fault
    pop %rax
    iretq

Buddy Allocator Role

Physical allocator (buddy system in mm/page_alloc.c) splits/merges power-of-2 blocks. On fault, kernel calls __alloc_pages (get_free_pages) from free_area lists. Zones: DMA, Normal, Highmem.

// Kernel physical alloc (C)
struct page *page = alloc_pages(GFP_KERNEL, 0);  // Order 0: single page
unsigned long phys_addr = page_to_pfn(page) << PAGE_SHIFT;

Page Table Updates

After alloc, kernel sets PTE (page table entry) with physical frame number (PFN), flags (present, writable). x86: 4-level paging (PML4, PDPT, PD, PT). Use pgd/p4d/pud/pmd/pte macros.

// Set PTE (C)
pte_t *pte = pte_offset_map(pmd, addr);
set_pte_at(mm, addr, pte, pte_mkyoung(pte_mkdirty(mk_pte(page, prot))));

Overcommitment and OOM

Lazy alloc allows overcommit (more virtual than physical). If RAM exhausts on fault, OOM killer (out_of_memory) selects/kills process. Sysctl vm.overcommit_memory tunes behavior.

# Tune overcommit
echo 1 > /proc/sys/vm/overcommit_memory  // Allow overcommit

Copy-on-Write (COW)

Related: Fork uses COW. Shared pages marked read-only; write fault allocates new physical page, copies data. Optimizes memory usage.

// Fork COW fault
if (write && pte_dirty(pte) && !pte_write(pte)) {
    // Handle COW: alloc new page, copy, update PTE
}

Zero Page Optimization

For anonymous pages, kernel maps read-only zero page initially. Write fault replaces with new zeroed physical page. Saves init time.

// Zero page (global)
struct page *empty_zero_page;

Swap and Major/Minor Faults

Minor fault: Page in memory but not mapped. Major: Disk/swap load. Buddy alloc for swap-in. vm_ops->fault for file-backed.

// Fault types
if (vmf->flags & FAULT_FLAG_MAJOR) {
    // Disk I/O for page
}
Demand Paging Flowchart
Malloc → Virtual VMA Create → Access → #PF (CR2=addr) → Handler Checks Valid → Alloc Physical (Buddy) → Map PTE (PFN + Flags) → Resume. Invalid → SIGSEGV.

Kernel Files and Structures

Key files: mm/memory.c (do_page_fault), mm/page_alloc.c (buddy), arch/x86/mm/fault.c. Structs: vm_area_struct (VMA), mm_struct (process memory), page (physical page).

// VMA struct (C)
struct vm_area_struct {
    unsigned long vm_start, vm_end;
    struct mm_struct *vm_mm;
    // ...
};

Bottom Line

Kernel defers physical allocation to page faults for efficiency. Virtual alloc immediate (malloc/mmap), physical via buddy on access. Enables overcommit, COW, zero-page opts. Low-level: #PF handler, PTE updates, CR2 register.

← Back