CPU Microarchitecture Deep Dive
Pipeline stages, out-of-order execution, branch prediction, register renaming, and hardware memory reordering.
Pipeline stages, out-of-order execution, branch prediction, register renaming, and hardware memory reordering.
Data layout, prefetching, cache-line alignment, and false sharing elimination.
How x86 and Linux bring up secondary cores and establish symmetric multiprocessing.
Page faults, vmalloc vs kmalloc, TLB shootdowns, and NUMA-aware allocation paths.
Hardware counters, perf, VTune, flame graphs, and custom instrumentation.
Static prediction, profile-guided hints, and conditional-move transformations.
Loop transformations, intrinsics, and cross-architecture SIMD utilization.
Stack, pool, and arena allocators — when malloc becomes the bottleneck.
GCC/Clang pass pipeline, PGO internals, and hand-written assembly trade-offs.
Division → multiplication, modulo → bitwise tricks, and loop invariant code motion.
Warp scheduling, register file partitioning, and independent thread scheduling.