Emre Cecanpunar

computer architecture, performance engineering, compilers, operating systems

Recent Posts

CPU Microarchitecture Deep Dive

2025-08-28

Pipeline stages, out-of-order execution, branch prediction, register renaming, and hardware memory reordering. How modern CPUs actually execute your code under the hood.

Memory and Cache Optimization

2025-07-15

Data layout optimization, prefetching strategies, and cache line alignment for reduced memory stalls.

Branch Prediction and Control Flow

2025-07-28

Minimizing branch mispredictions through code restructuring, conditional moves, and understanding predictor behavior.

Compiler Optimization Internals

2025-04-05

Understanding GCC and Clang optimization passes, PGO implementation details, and when to use assembly for critical sections.

Auto-Vectorization and SIMD

2025-05-25

Compiler vectorization hints, intrinsics usage, and loop transformations for maximum SIMD utilization across x86, ARM, and GPU architectures.

Strength Reduction in Critical Paths

2025-08-10

Replacing expensive operations with cheaper equivalents. Division by multiplication, modulo with bitwise AND, and other algebraic transformations.

Threading, Parallelism and Concurrency

2025-05-12

Context switching overhead, false sharing pitfalls, lock contention analysis, and NUMA considerations. When threading helps performance and when it destroys it.

Custom Allocators

2025-07-22

Stack, pool, arena, and ring buffer allocators. When malloc() becomes the bottleneck and how specialized allocation patterns achieve 10-100x speedups.

Performance Profiling Deep Dive

2025-04-30

Hardware performance counters, flame graphs, and bottleneck identification using perf, Intel VTune, and custom instrumentation.

Low-Level SMP Boot and Multicore Execution in x86

2025-09-22

How x86 and linux kernel boots up other processors and how do they achieve multiprocessing AKA SMP.

Low-Level Memory Allocation and Demand Paging in Kernel

2025-09-24

Internals of memory allocation in Linux Kernel.

Understanding Streaming Multiprocessors in NVIDIA GPUs

2025-09-30

Internal Architecture of Nvidia GPUs and execution model.

Special Thanks

Onur Mutlu – Computer Architecture, PIM, Memory Centric Computing, RowHammer
SAFARI Research Group – Computer Architecture, PIM, Memory Centric Computing
Kasirga Microprocessors Lab – Computer Architecture
Nick (CoffeeBeforeArch) – Performance engineering, CPU/GPU architectures and programming
Agner Fog – CPU microarchitecture, optimization manuals
Casey Muratori – Game dev, low-level graphics, performance-focused programming
Ryan Fleury – Low-level performance, debuggers
Fabian Giesen – Low-level graphics, software rendering, compression
Sergey Slotin – Algorithms, performance engineering
Brendan Gregg – Performance analysis, flame graphs, kernel & CPU insights
Justine Tunney – Low-level programming tricks, size optimization
Rui Ueyama – Minimal compilers
Daniel Anderson – Concurrency, wait-free algorithms
Eric Preshing – Concurrency, memory models
Fabrice Bellard – Minimalism, compilers, emulators