A ground-up curriculum for systems programmers with C and x86-64 Assembly foundations. From IR basics to production compiler passes — no shortcuts, no handwaving.
-emit-llvm (output IR), -S (text form), -O0/-O2/-O3 (optimization levels), -c (object file). Practice: compile a C file to IR, to assembly, and to object code using only clang flags.-passes="mem2reg,dce", --print-changed, -stats. This is the tool you'll use most in Phase 4 when writing your own passes.-march (target arch), -filetype=asm vs obj. Try targeting a different arch than your host to feel what "hardware independence" actually means.alloca and why.)i1, i8, i32, i64 (integers), float, double, ptr (opaque pointer — modern LLVM), [N x T] (arrays), {T1, T2} (structs), T(T1, T2)* (function pointers).alloca (stack), load/store, add/sub/mul/sdiv, icmp/fcmp, br (conditional and unconditional), call/ret, getelementptr (GEP — pointer arithmetic), bitcast/trunc/zext/sext.lli. This is the IR equivalent of writing assembly from scratch — it will hurt once and teach forever.codegen() method that returns an llvm::Value*. This interface is the bridge between frontend and backend.LLVMContext owns all IR objects. Module is a compilation unit (a .ll file in C++ form). IRBuilder is your "cursor" — it tracks the current insertion point and has methods for every instruction. You'll use these in every compiler you ever write.CreateAdd/Sub/Mul, CreateFCmpOLT (float compare), CreateBr/CreateCondBr, CreateCall, CreateRet, CreateAlloca, CreateStore/Load. Practice until you can translate IR syntax to C++ API calls without looking it up.Function::Create() with FunctionType. BasicBlock::Create() and setting the IRBuilder insertion point with SetInsertPoint(). Handle function arguments with func->arg_begin(). This pattern repeats in every backend codegen.IRCompileLayer, RTDyldObjectLinkingLayer. For Kaleidoscope, you add functions to the JIT incrementally as the user types them. This mirrors how REPLs work.sin, printf). Learn how symbol lookup works: JIT → process symbols → stdlib. Implement extern declarations in your language so Kaleidoscope can call C functions.FunctionPass, ModulePass, LoopPass. Pass registration with PassPluginLibraryInfo for out-of-tree passes.run(Function &F, FunctionAnalysisManager &AM) method. It returns PreservedAnalyses — telling the pass manager which analyses are still valid after your transformation. Returning PreservedAnalyses::all() vs none() has performance implications.for (BasicBlock &BB : F), for (Instruction &I : BB). Learn to use dyn_cast<BranchInst>(&I) for instruction type checking. This is boilerplate you'll type hundreds of times.DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F).AliasAnalysis returns MustAlias, MayAlias, or NoAlias. Fundamental for auto-vectorization and memory-level parallelism.x * 2 → x << 1, x + 0 → x, x / 2 → x >> 1 (unsigned). Study the existing instcombine source — it's 50k lines of patterns and teaches you how to think about transformations.I.use_empty() and instruction has no side effects, erase it.mul i32 %x, 4) and replaces them with shifts (shl i32 %x, 2). Extend to handle division. Test with opt --load-pass-plugin=./libMyPass.so -passes="strength-reduce".InlineFunction() utility in LLVM API.SmallVector<T, N>: vector with N elements on the stack before heap allocation. Avoids allocations for small collections (most compiler data). ArrayRef: non-owning reference to any array-like container. Use these everywhere in compiler code — never raw std::vector for IR-level work.StringRef: non-owning reference to a string — zero-copy. Twine: lazy string concatenation tree. Never allocate intermediate strings in hot paths. Profilers hate std::string in compilers.for (Use &U : V.uses()). Replace all uses with V.replaceAllUsesWith(NewV). This is the foundation of every transformation — finding what uses what.llvm/lib/Transforms/ contains all the production passes. llvm/lib/Target/X86/ is the best-commented backend. Read real code, not tutorials.