x86 SMP enables multicore execution via APIC for interrupts and core coordination. BSP boots first, wakes APs with IPIs. Low-level: MSRs, registers, assembly trampolines, run queues for threading.
BSP (Bootstrap Processor) is hardware-designated (usually core 0) at power-on. BIOS/UEFI from EEPROM initializes BSP, loads bootloader (e.g., GRUB), which loads kernel. BSP's APIC ID fixed via CPUID (EAX=1, EBX bits 24-31).
; BSP detection via CPUID mov eax, 1 cpuid shr ebx, 24 ; APIC ID in EBX[31:24] ; BSP if ID matches hardware default (often 0)
Enable Local APIC on BSP via MSR IA32_APIC_BASE (0x1B). Base address 0xFEE00000 (cluster mode) or relocatable. Set bit 11 (enable) and bit 8 (BSP flag, read-only).
; Enable APIC (assembly) mov ecx, 0x1B ; IA32_APIC_BASE MSR rdmsr ; Read MSR to EAX:EDX or eax, (1 << 11) ; Set enable bit wrmsr ; Write back ; APIC registers (memory-mapped) mov edi, 0xFEE00000 ; APIC base mov dword [edi + 0xF0], 0x1FF ; Spurious vector 0xFF, enable APIC (bit 8)
Legacy MP tables (signature "_MP_") in EBDA/BIOS ROM/base memory provide CPU/IO APIC info. ACPI MADT lists LAPICs. Search MP floating pointer: aligned 16-byte, checksum to 0.
; Search MP table (pseudocode)
for addr in [0x9FC00, 0xE0000-0xFFFFF, etc.]:
if *(uint32_t*)addr == 0x5F504D5F: ; "_MP_"
// Parse configuration table at *(uint32_t*)(addr+4)
BSP sends INIT IPI to reset APs, then SIPI (Startup IPI) with vector (real-mode start address / 4096). Use ICR (0x300/0x310). Poll delivery status (bit 12). Send two SIPIs if needed.
; Send INIT IPI (assembly)
mov dword [APIC_BASE + 0x310], (target_id << 24) ; Target APIC ID
mov dword [APIC_BASE + 0x300], 0x000C4500 ; INIT, all excluding self
; Delay ~10ms
; Send SIPI
mov dword [APIC_BASE + 0x300], 0x000C4600 | (vector & 0xFF) ; SIPI, vector=0x08 for 0x8000
; Poll delivery
loop:
mov eax, [APIC_BASE + 0x300]
test eax, (1 << 12)
jnz loop
APs start in real mode at SIPI vector (e.g., 0x8000). Trampoline: CLI, load GDT, enable protected mode (CR0 bit 0), jump to 32/64-bit code. Set per-AP stack (e.g., APIC_ID * 32K).
; Trampoline at 0x8000 (real mode) cli cld jmp 0x8040 ; Far jump ; GDT at 0x8010: null, code (0x08), data (0x10) dw 0xFFFF, 0x0000, 0x9A00, 0x00CF ; Code segment ; LGDT lgdt [0x8030] ; Protected mode mov eax, cr0 or al, 1 mov cr0, eax jmp 0x08:protected_entry protected_entry: mov ax, 0x10 mov ds, ax mov ss, ax ; Get APIC ID mov eax, 1 cpuid shr ebx, 24 shl ebx, 15 ; Stack offset mov esp, stack_top sub esp, ebx call ap_init ; Kernel AP entry
Per-core run queues (struct rq) hold tasks. Scheduler (e.g., CFS in kernel/sched/core.c) assigns tasks via load balancing. Task struct: code RIP, stack, regs.
// Linux run queue (C)
struct rq {
struct cfs_rq cfs; // Fair scheduler queue
// ...
};
void sched_assign(struct task_struct *task, int cpu) {
// Add to per_cpu(runqueues, cpu)
}
Scheduler sends reschedule IPI (vector e.g., 0x40) to target core via smp.c. Triggers context switch from run queue.
// IPI send (C)
void smp_send_reschedule(int cpu) {
apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
}
Save/restore regs (PUSHALL, POPALL), switch stacks/RIP. Low-level in entry_64.S.
; Context switch (assembly)
switch_to:
push rbp
// Save old task regs
mov [old_task + TS_RSP], rsp
mov rsp, [new_task + TS_RSP]
// Restore new
pop rbp
ret
Use spinlocks (LOCK prefix) for shared data. Pitfalls: APIC delivery failures (poll bit 12), timing (udelay), spurious interrupts (handle via 0xF0).
; Spinlock
lock:
lock bts [lock_var], 0
jc lock ; Spin if set
Patterns in multicore:
// Boot: BSP-centric, serial // Runtime: Parallel, per-core queues // IPIs: For TLB flush, reschedule // x2APIC: MSR-based (0x800+), no memory map
SMP boot: BSP enables APIC via MSR, parses tables, sends INIT-SIPI via ICR. APs trampoline to kernel. Runtime: Scheduler distributes via run queues, IPIs trigger switches. Low-level: Registers 0xFEE00000+, MSRs, assembly for mode switches.