Compiler Engineering 2026 Edition

LLVM
Curriculum

A ground-up curriculum for systems programmers with C and x86-64 Assembly foundations. From IR basics to production compiler passes — no shortcuts, no handwaving.

MONTH 1 3 5 7 9 12+
Phase 1: Architect View
Phase 2: IR Fluency
Phase 3: Kaleidoscope
Phase 4: Passes
Phase 5: Specialization
Prerequisite: You have C fluency and x86-64 Assembly basics. LLVM IR will feel familiar because it IS assembly, just hardware-independent.

Navigation

01
Month 1 Foundation
The Architect's View
Goal: Understand LLVM's Three-Phase design. Know what IR is and why it exists. Run your first clang/opt/llc pipeline.
Key Takeaways
  • Understand why LLVM uses 3-phase design (N languages + M targets)
  • Read IR as hardware-independent assembly — it's RISC-like with infinite registers
  • Run the clang -> opt -> llc pipeline and understand each phase's role
  • Know the SSA form: every variable assigned exactly once
  • Build LLVM from source and explore the codebase structure
Core Concepts
Why Three-Phase Compilers Win
Frontend -> Optimizer -> Backend. N languages × M targets = N+M components, not N×M. This is the entire reason LLVM exists. Read the AOSA chapter by Chris Lattner — it's 20 pages that reframe how you think about tools.
AOSA ArticleArchitecture
What is LLVM IR? (The "Virtual Assembly")
IR is a RISC-like assembly language with infinite virtual registers, strong typing, and no hardware assumptions. It's what lets the optimizer work independently of both the source language and the CPU. Your x86-64 knowledge transfers — IR is just a cleaner ISA.
IRSSA
Clang vs GCC Architecture Differences
GCC is a monolith — frontend and backend are tightly coupled. Clang is a library-based frontend that emits LLVM IR. Understand why this matters for tooling, IDE integration, and static analysis.
Compilers
Tool Mastery
clang — The Frontend
Your entry point. Key flags to know: -emit-llvm (output IR), -S (text form), -O0/-O2/-O3 (optimization levels), -c (object file). Practice: compile a C file to IR, to assembly, and to object code using only clang flags.
# C -> LLVM IR (text)
clang -S -emit-llvm -O0 factorial.c -o factorial.ll
# C -> x86-64 Assembly
clang -S -O0 factorial.c -o factorial.s
# Compare: see what changes at O2
clang -S -emit-llvm -O2 factorial.c -o factorial_opt.ll
clang
opt — The Optimizer
Takes .ll files, runs passes, outputs optimized .ll. Learn: -passes="mem2reg,dce", --print-changed, -stats. This is the tool you'll use most in Phase 4 when writing your own passes.
# Run mem2reg pass (promotes memory to SSA registers)
opt -passes="mem2reg" factorial.ll -S -o optimized.ll
# See what changed
opt -passes="mem2reg,dce,instcombine" -S --print-changed factorial.ll
opt
llc — The Backend
Converts IR to machine code. Key flags: -march (target arch), -filetype=asm vs obj. Try targeting a different arch than your host to feel what "hardware independence" actually means.
# IR -> x86-64 Assembly
llc -march=x86-64 factorial.ll -o factorial_from_ir.s
# IR -> ARM Assembly (cross-compile)
llc -march=aarch64 factorial.ll -o factorial_arm.s
llc
Godbolt (Compiler Explorer) Workflow
Set up a permanent workflow: left pane = C source, right pane 1 = clang x86-64, right pane 2 = clang LLVM IR. Highlight a C expression and see both outputs highlight simultaneously. This dual-map exercise will rewire how you think about code.
Godboltgodbolt.org
lli — The JIT Interpreter
Runs .ll files directly. Not for production, but essential for learning IR: write it, run it, see the result. No compilation step required.
# Write IR, run immediately
lli factorial.ll
lliJIT
LLVM IR Essentials
LLVM IR Syntax Cheat Sheet
Learn the bare essentials: types (i32, i64, f64), instructions (add, sub, mul, br, phi, call, ret), blocks (entry:, loop:), functions (define, declare). Read 10 small IR files and rewrite them from memory. Aim for fluency within days.
; Function signature
define i32 @add(i32 %a, i32 %b) {
entry:
  %result = add i32 %a, %b
  ret i32 %result
}
IRSyntax
From C to IR: Type Translation
How C types map to IR: int -> i32, long -> i64, double -> f64, struct -> {type1, type2, ...}, int* -> i32*. Pointers in IR carry type information; this is crucial for type-safe optimizations.
Types
Control Flow: Blocks and Edges
A function is a graph of basic blocks. Each block ends in a terminator (br, ret, switch). Blocks are connected by edges. This view — treating code as a CFG (Control Flow Graph) — is how optimizers see your program. Draw the CFG for a simple if-else and a loop.
; Two basic blocks
entry:
  %cond = icmp slt i32 %x, 10
  br i1 %cond, label %then, label %else
then:
  ret i32 1
else:
  ret i32 0
CFGControl Flow
Memory and Pointers in IR
alloca reserves stack memory. load reads from memory, store writes to it. In SSA form, memory operations break the "single assignment" rule, so early passes run mem2reg to convert them to registers. Understanding when and why memory is used is key.
; Memory operations
%ptr = alloca i32
store i32 42, i32* %ptr
%val = load i32, i32* %ptr
ret i32 %val
MemoryPointers
Hands-On Experiments
Experiment: IR from Simple Loop
Write a C loop that adds 1 to a variable 10 times. Generate IR at -O0 (lots of memory ops) and -O2 (optimized). Compare: What did -O2 eliminate? Why? This shows why optimization levels matter.
// C code
int sum = 0;
for (int i = 0; i < 10; i++) sum += i;
return sum;
clang
Experiment: IR from Function Call
Write a C file that calls a helper function. Generate IR and find the call instruction. Understand how arguments are passed, how return values work, and how calling conventions are hidden in the IR (the target-specific ABI handles them later).
clang
Experiment: Array Access in IR
Write C code: int arr[10]; arr[5] = 42; Generate IR. Understand getelementptr (GEP) — LLVM's way of computing addresses without loading/storing. This is a critical concept for optimizing array-heavy code.
; GEP: Get Element Pointer
%ptr = getelementptr i32, i32* %arr_start, i32 5
store i32 42, i32* %ptr
GEP
Experiment: Cross-Architecture Code Gen
Take a simple .ll file. Compile it to x86-64, ARM64, and RISC-V. See how the same IR produces different assembly. This is the entire point of the three-phase design.
# Same IR, three targets
llc -march=x86-64 prog.ll
llc -march=aarch64 prog.ll
llc -march=riscv64 prog.ll
llc
Understanding the Ecosystem
LLVM Project Structure
llvm/lib/IR — IR definitions. llvm/lib/Transforms — optimization passes. llvm/lib/Target — backends. clang/lib — C/C++ frontend. Don't try to understand everything; learn to navigate the source tree to find what you need.
Source Code
LLVM Passes: The Workflow
Optimization happens through passes. Each pass analyzes and transforms IR. They run in sequence (PassManager coordinates them). In Phase 1, just know what passes exist; Phase 3 you'll understand them, Phase 4 you'll write them.
PassesOptimization
CMake and LLVM Build System
LLVM uses CMake. Learn the basics: cmake -G Ninja, ninja, ninja llvm-tools. You won't need to modify the build system now, but knowing how to build LLVM will save hours of debugging later.
# Build LLVM with Ninja
mkdir build && cd build
cmake -G Ninja ..
ninja
CMake
Debugging Compilation: -debug Flag
LLVM tools accept -debug to print verbose pass execution info. Build LLVM with assertions enabled (CMAKE_BUILD_TYPE=Debug) and use -debug -debug-only=pass-name to spy on what specific passes do.
# See what mem2reg pass does
opt -debug-only=mem2reg -passes="mem2reg" prog.ll -S
Debugging
Common Pitfalls
IR ≠ Machine Code (It's Higher-Level)
Common mistake: Thinking IR is "just assembly." It's not. IR is abstract. Infinite registers. No ABI knowledge. No platform specifics. The backend's job is to fill in those gaps. Don't expect IR to look like x86-64.
Mindset
Optimization Levels Aren't Magic
-O0 disables most passes. -O2 runs mid-level optimizations. -O3 adds aggressive ones. They're just different pass pipelines. Reading what each level enables (in PassBuilder code) teaches you what tools the optimizer has.
Optimization
You Don't Need to Memorize IR Syntax
Spend a few days reading it until patterns emerge. Then refer to docs as needed. The important thing is intuition: seeing IR and predicting what it does, not perfect syntax recall.
Learning Strategy
Understanding IR Instructions
Arithmetic Instructions
add, sub, mul, udiv, sdiv, urem, srem. These are type-safe (i32 add i32 always returns i32). No overflow semantics. Learn how integer widening affects instruction choice: i32 vs i64 multiply have different semantics on overflow.
; Arithmetic operations
%a = add i32 10, 20
%b = mul i64 %x, %y
%c = sdiv i32 %a, 3 ; signed division
InstructionsArithmetic
Bitwise and Logical Instructions
and, or, xor, shl, lshr, ashr. These operate on bits directly. Know the difference between logical shift (lshr) and arithmetic shift (ashr) — one zero-fills, one sign-extends. These are crucial for understanding how optimizers can convert multiplication/division to shifts.
; Bitwise operations
%x = shl i32 %val, 2 ; val * 4
%y = lshr i32 %val, 1 ; val / 2 (unsigned)
%z = and i32 %a, %b ; bitwise AND
InstructionsBitwise
Comparison Instructions (icmp, fcmp)
icmp compares integers with predicates (eq, ne, lt, le, gt, ge, ult, ule, ugt, uge, slt, sle, sgt, sge). fcmp compares floats with additional predicates for NaN handling. These produce i1 (boolean) values used in branches.
; Comparisons
%cond = icmp slt i32 %x, 10 ; signed less-than
%result = fcmp oeq f64 %a, %b ; ordered equal
InstructionsComparison
Branch Instructions (br, switch)
br i1 condition branches to one of two labels. switch branches on value. These terminate basic blocks. Understanding control flow requires fluency with these — they define the CFG structure.
; Branching
br i1 %cond, label %then, label %else
switch i32 %val, label %default [
  i32 0, label %case0
  i32 1, label %case1
]
Control Flow
The Phi Node (Merging SSA Values)
The phi instruction merges SSA values from different control paths. If you branch to a block from 2 predecessors with different values, phi selects which value to use based on which predecessor you came from. This is how SSA handles variables with multiple assignments.
; Control flow merge with phi
then:
  %val_then = add i32 %x, 1
  br label %merge
else:
  %val_else = add i32 %x, 2
  br label %merge
merge:
  %result = phi i32 [%val_then, %then], [%val_else, %else]
SSAPhi Nodes
The LLVM C++ API Basics
Module, Function, BasicBlock Classes
Module is the root object containing Functions. Function contains BasicBlocks. BasicBlock contains Instructions. Learn the hierarchy. In Phase 1, focus on reading and understanding this structure. Phase 3 you'll write code to construct it.
C++ APIData Structures
LLVMContext: The Memory Manager
Every LLVM object (Type, Value, Instruction) needs a context. Think of it as a memory manager and type system. You can't mix objects from different contexts. In practice, you'll create one context per compilation unit.
C++ API
Iterating Over IR: User, Use, and Def
Instructions have users (things that consume their output) and operands (things they consume). Learn to traverse this: for each instruction, for each operand, get its definition, process recursively. This is how analysis passes walk the IR.
C++ APITraversal
Reading the Source: llvm/IR/Instruction.h
The best documentation for Instruction is the header file itself. Read it. See what methods are available. Understand how RTTI (dyn_cast, isa) lets you determine instruction types. This teaches you the true API.
Source Code
Analyzing Real Compiled Code
Compile Clang Itself to IR
clang -emit-llvm -c src/clang.cpp -o clang.bc then llvm-dis clang.bc to get IR. It's huge (thousands of functions). Pick one small function (say, 20 lines of C code) and find its IR equivalent. Trace through it to understand what it does.
Study Inlining: Before and After
Compile with -O0 (no inlining): see call instructions. Compile with -O2 (inlining enabled): same functions might be inlined directly. Compare the IR size. Understand why inlining matters for performance: fewer jumps, more optimization opportunities in the combined code.
Vectorization Patterns
Compile a tight loop that computes the same operation on array elements. At -O0, you'll see scalar operations. At -O3 with -march=native, the loop might vectorize: you'll see vector types (x4 i32, etc.) and vector operations (add on 4 integers at once).
Dead Code Elimination in Practice
Write C code that computes something but never uses the result. At -O0: you'll see useless instructions. At -O2: those instructions vanish (dead code elimination pass removed them). This is the first concrete optimization you should see.
// C code
int x = 5 * 10; // never used
printf("hello"); // only this matters
Important Tools and Flags
llvm-dis: Disassemble IR
Convert bitcode (.bc) to text IR (.ll). Essential for reading compiled code. Bitcode is compact and efficient for the compiler, but humans need text.
llvm-dis program.bc -o program.ll
llvm-dis
llvm-as: Assemble IR
Convert text IR (.ll) back to bitcode (.bc). Lets you write IR by hand and compile it.
llvm-as program.ll -o program.bc
llvm-as
llvm-objdump: Object File Disassembly
Like objdump for compiled binaries, but understands LLVM-generated code. Useful for comparing IR optimizations to final machine code.
llvm-objdump -d program.o | less
llvm-objdump
opt -stats: Pass Statistics
See how many optimizations each pass performed. Useful for understanding which passes are active at each -O level and how much they transform the code.
opt -O2 -stats program.ll -o /dev/null
opt
llvm-config: Build Information
Query your LLVM installation: version, flags, paths. Essential when writing tools that use the C++ API.
llvm-config --version
llvm-config --cxxflags
llvm-config --ldflags
llvm-config
Core Compiler Terminology
Basic Block: The Atom of Optimization
A sequence of instructions with no branches in the middle. Branches only appear at the end. Every function is a graph of basic blocks. Optimizations work at basic block level (local) or across blocks (global/interprocedural).
Terminology
Dominance: B Dominates A if Every Path to A Goes Through B
Critical for optimization. If B dominates A, code in B always executes before A. Used for hoisting invariant code, dead code elimination, and more. Dominance trees are a core data structure in LLVM analysis.
TerminologyAnalysis
Live Variables: Which Values Are Actually Used
A value is live at a point if it might be used on some path forward. If a value is not live, its computation can be eliminated (dead code). Live variable analysis is fundamental to many optimizations.
TerminologyAnalysis
Reaching Definitions: Which Assignment Reaches Here
For a variable use at point P, which assignment definition could have produced the current value? In SSA, this is trivial (SSA guarantees exactly one reaching definition per use). In traditional code, reaching definitions analysis is complex.
TerminologyAnalysis
Loop-Invariant Code Motion (LICM)
Move computations that don't change inside a loop to before the loop. Requires dominance, control dependence, and data dependence analysis. A classic optimization that dramatically improves loop performance.
TerminologyOptimization
Phase 1 Resources

IR Foundations & Core Concepts

02
Months 2-3 Core Language
Fluency in LLVM IR
Goal: Read and write LLVM IR by hand. SSA form must become second nature. You should be able to look at IR and know exactly what it does without running it.
Key Takeaways
  • Passes transform IR in-place; composition of passes = optimization pipeline
  • SSA simplifies analysis: use-def chains enable powerful algorithms
  • Dominance relationships are fundamental to control flow analysis
  • Alias analysis determines what memory locations can overlap
  • Write your first pass and run it on real programs
SSA Form — The Core Idea
Static Single Assignment (SSA)
Every variable is assigned exactly once. If you need to reassign, you create a new variable. This sounds crazy at first, but it makes data-flow analysis trivially simple. Compilers can prove "this value never changes" without complex analysis.
; Before SSA (pseudocode)
x = 1
x = x + 2 ; x reassigned
; After SSA
%x1 = add i32 0, 1
%x2 = add i32 %x1, 2 ; new name
SSAData Flow
Phi Nodes — Handling Branches
The one hard part of SSA. When control flow merges (after an if/else), which version of a variable do you use? The phi node selects based on which basic block you came from. Critical to understand before Phase 4.
; if (cond) { x = 1; } else { x = 2; }
%result = phi i32 [ 1, %if.true ], [ 2, %if.false ]
Phi NodesCFG
mem2reg Pass — Why alloca Exists
Clang at -O0 doesn't generate SSA directly — it generates alloca/store/load. The mem2reg pass promotes these memory operations to SSA registers. This is why unoptimized IR looks messy: it's designed to be simple to generate, then cleaned up.
mem2regopt
IR Syntax & Types
Type System
IR is strongly typed. Master: i1, i8, i32, i64 (integers), float, double, ptr (opaque pointer — modern LLVM), [N x T] (arrays), {T1, T2} (structs), T(T1, T2)* (function pointers).
TypesLangRef
Core Instruction Set
Must know: alloca (stack), load/store, add/sub/mul/sdiv, icmp/fcmp, br (conditional and unconditional), call/ret, getelementptr (GEP — pointer arithmetic), bitcast/trunc/zext/sext.
; Simple function: int add(int a, int b)
define i32 @add(i32 %a, i32 %b) {
  %result = add i32 %a, %b
  ret i32 %result
}
Instructions
getelementptr (GEP) — The Tricky One
GEP is pointer arithmetic, not a memory access. It calculates an address — it does NOT dereference. This trips up everyone. The first index dereferences the pointer type, subsequent indices navigate struct/array fields. Study at least 5 examples before moving on.
; Access arr[3] where arr is [10 x i32]*
%ptr = getelementptr [10 x i32], ptr %arr, i64 0, i64 3
GEPPointers
Basic Blocks & Control Flow Graph (CFG)
A function is a collection of basic blocks. Each block ends with a terminator (br or ret). Edges between blocks form the CFG. Every optimization in LLVM operates on this graph structure. Draw CFGs by hand for small functions.
CFGBasic Blocks
LLVM LangRef — Your Dictionary
Don't read it cover to cover. Learn to navigate it. Every time you see an instruction you don't know, look it up. Bookmark the sections: Type System, Instruction Reference, Intrinsics. You'll consult this daily.
llvm.org/docs/LangRef.html
03
Months 4-5 First Compiler
The Kaleidoscope Rite
Goal: Build a real compiler end-to-end. Text in -> JIT execution out. Use the official LLVM Kaleidoscope tutorial as your guide, but understand every line — don't copy-paste.
Key Takeaways
  • Kaleidoscope teaches language design, not just code generation
  • Lexer -> Parser -> AST -> Codegen is the compiler pipeline
  • Type checking and semantic analysis must happen in the frontend
  • Error recovery and diagnostic messages are part of a real frontend
  • Clang architecture shows how production frontends are structured
Compiler Frontend
Lexing (Tokenization)
Convert raw text into a stream of tokens (keywords, identifiers, numbers, operators). Write a hand-rolled lexer — no lex/flex. Kaleidoscope's lexer is ~100 lines of C++. Key: every token needs a type and a value.
// "def foo(x) x + 1" ->
[ TOKEN_DEF, IDENT("foo"), LPAREN,
  IDENT("x"), RPAREN, IDENT("x"),
  PLUS, NUMBER(1) ]
LexerTokens
Parsing — Recursive Descent
Convert tokens into an AST. Recursive descent is the industry standard for hand-written parsers. Each grammar rule becomes a function. Kaleidoscope uses Pratt parsing for expressions (precedence climbing). Learn both.
ParserASTPratt Parsing
AST Node Design
Design your AST classes: ExprAST (base), NumberExprAST, VariableExprAST, BinaryExprAST, CallExprAST, FunctionAST. Each node must have a codegen() method that returns an llvm::Value*. This interface is the bridge between frontend and backend.
AST Design
LLVM C++ API — Code Generation
The Big Three: Context, Module, IRBuilder
LLVMContext owns all IR objects. Module is a compilation unit (a .ll file in C++ form). IRBuilder is your "cursor" — it tracks the current insertion point and has methods for every instruction. You'll use these in every compiler you ever write.
auto TheContext = std::make_unique<llvm::LLVMContext>();
auto TheModule = std::make_unique<llvm::Module>("kaleid", *TheContext);
auto Builder = std::make_unique<llvm::IRBuilder<>>(*TheContext);
LLVM C++ API
IRBuilder Methods for Every Construct
Map each AST node to IRBuilder calls: CreateAdd/Sub/Mul, CreateFCmpOLT (float compare), CreateBr/CreateCondBr, CreateCall, CreateRet, CreateAlloca, CreateStore/Load. Practice until you can translate IR syntax to C++ API calls without looking it up.
IRBuilder
Function & BasicBlock Creation
Function::Create() with FunctionType. BasicBlock::Create() and setting the IRBuilder insertion point with SetInsertPoint(). Handle function arguments with func->arg_begin(). This pattern repeats in every backend codegen.
FunctionsBasicBlocks
JIT Compilation
ORC JIT (LLVM's Modern JIT API)
ORC (On-Request Compilation) is LLVM's composable JIT framework. Key layers: IRCompileLayer, RTDyldObjectLinkingLayer. For Kaleidoscope, you add functions to the JIT incrementally as the user types them. This mirrors how REPLs work.
ORC JITREPL
Symbol Resolution & Extern Functions
JIT needs to find symbols (like sin, printf). Learn how symbol lookup works: JIT -> process symbols -> stdlib. Implement extern declarations in your language so Kaleidoscope can call C functions.
Symbol Resolution
04
Months 6-9 Job-Ready Phase
Writing Passes & Optimizations
Goal: Write code that transforms other code. This is what compiler engineers do at companies. Master the pass infrastructure, data structures, and analysis frameworks.
Key Takeaways
  • MIR (Machine IR) bridges LLVM IR and actual assembly
  • Instruction selection, register allocation, and scheduling are hard problems
  • TableGen generates instruction definitions and patterns automatically
  • Calling conventions and ABI are critical for correctness
  • Writing a backend requires understanding your target architecture deeply
New Pass Manager (NPM)
New Pass Manager Architecture
LLVM switched from the Legacy Pass Manager to the New Pass Manager (NPM) in LLVM 13+. All new code uses NPM. Key types: FunctionPass, ModulePass, LoopPass. Pass registration with PassPluginLibraryInfo for out-of-tree passes.
NPMPass Manager
Pass Anatomy: run() Method
Every pass implements a run(Function &F, FunctionAnalysisManager &AM) method. It returns PreservedAnalyses — telling the pass manager which analyses are still valid after your transformation. Returning PreservedAnalyses::all() vs none() has performance implications.
struct MyPass : PassInfoMixin<MyPass> {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
    // walk instructions, transform, ...
    return PreservedAnalyses::none();
  }
};
Pass Anatomy
Analysis Passes
Iterating the CFG
Walk functions, basic blocks, and instructions. Pattern: for (BasicBlock &BB : F), for (Instruction &I : BB). Learn to use dyn_cast<BranchInst>(&I) for instruction type checking. This is boilerplate you'll type hundreds of times.
CFG Traversal
Dominance Analysis
A block A dominates B if every path to B goes through A. Crucial for: finding where to hoist code, validating SSA, loop analysis. Access via DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F).
DominanceDomTree
Alias Analysis
Can two pointers point to the same memory? If not, loads/stores can be reordered. AliasAnalysis returns MustAlias, MayAlias, or NoAlias. Fundamental for auto-vectorization and memory-level parallelism.
Alias Analysis
Dead Code Analysis Pass
Write a pass that identifies unreachable basic blocks (blocks with no predecessors in the CFG, excluding the entry block) and instructions with no users that also have no side effects. Print a report. This teaches you CFG traversal + use-def chains.
Build This
Transform Passes
Instruction Combining (instcombine)
The most important optimization pass. Replaces instruction patterns with cheaper equivalents. Examples: x * 2 -> x << 1, x + 0 -> x, x / 2 -> x >> 1 (unsigned). Study the existing instcombine source — it's 50k lines of patterns and teaches you how to think about transformations.
; Before instcombine
%r = mul i32 %x, 2
; After instcombine
%r = shl i32 %x, 1
instcombine
Dead Code Elimination (DCE/ADCE)
Remove instructions whose results are never used. ADCE (Aggressive DCE) also removes unreachable blocks. Implement a basic DCE: iterate instructions in reverse, if I.use_empty() and instruction has no side effects, erase it.
DCEBuild This
Loop Transformations
LICM (Loop-Invariant Code Motion): if a computation's inputs don't change in a loop, hoist it outside. Loop Unrolling: replicate loop body N times to reduce branch overhead. Loop Vectorization: convert scalar loops to SIMD. These are where most performance gains come from in real-world code.
LICMUnrollingVectorization
Strength Reduction Pass
Write a pass that finds multiply-by-power-of-2 patterns (mul i32 %x, 4) and replaces them with shifts (shl i32 %x, 2). Extend to handle division. Test with opt --load-pass-plugin=./libMyPass.so -passes="strength-reduce".
Build This
Inlining
Replace a function call with the function body. Eliminates call overhead and enables further optimizations across call boundaries. Learn the inlining cost model: LLVM uses a heuristic based on instruction count. InlineFunction() utility in LLVM API.
Inlining
LLVM Data Structures (ADT Library)
SmallVector, SmallString, ArrayRef
SmallVector<T, N>: vector with N elements on the stack before heap allocation. Avoids allocations for small collections (most compiler data). ArrayRef: non-owning reference to any array-like container. Use these everywhere in compiler code — never raw std::vector for IR-level work.
ADTSmallVector
StringRef, Twine
StringRef: non-owning reference to a string — zero-copy. Twine: lazy string concatenation tree. Never allocate intermediate strings in hot paths. Profilers hate std::string in compilers.
StringRefTwine
DenseMap, DenseSet
Hash maps/sets optimized for pointer keys (common in IR — Value*, BasicBlock*). Much faster than std::unordered_map for small-to-medium sizes due to open addressing and cache locality.
DenseMap
Use-Def Chains
Every Value in LLVM has a list of uses. Iterate with for (Use &U : V.uses()). Replace all uses with V.replaceAllUsesWith(NewV). This is the foundation of every transformation — finding what uses what.
Use-Def
05
Month 10+ Specialization
Choose Your Track
Goal: Deep mastery in one high-value niche. All three tracks are hiring. Pick based on what excites you — motivation beats market research at this level.
Key Takeaways
  • Vectorization exploits SIMD parallelism; SLP and loop vectorization differ
  • GlobalISel offers alternative instruction selection (modern approach)
  • Garbage collection metadata and statepoints handle managed languages
  • Profiling-guided optimization uses runtime data for better decisions
  • Specialize in your interest: LTO, PGO, MLIR, debugging, or custom domains
Track A AI Hardware
MLIR & AI Compilers
  • MLIR dialects (Linalg, Affine, Vector, GPU)
  • Lowering pipelines: PyTorch/XLA -> MLIR -> LLVM IR
  • Tiling and fusion for tensor ops
  • Writing a custom MLIR dialect
  • Polyhedral optimization model
  • Target: IREE, ONNX-MLIR, Triton (GPU)
  • Companies: Google, Apple (Core ML), Nvidia (Triton)
Track B Security
Obfuscation & Binary Analysis
  • Control Flow Flattening pass
  • Instruction substitution (replace ops with equiv)
  • Bogus control flow injection
  • String encryption passes
  • LLVM-based sanitizers (ASan, UBSan internals)
  • Study: OLLVM, Hikari, o-llvm
  • Companies: Security firms, game anti-cheat, DRM
Track C New Hardware
Custom Backend / CPU
  • TableGen — describing instructions declaratively
  • Target machine classes hierarchy
  • Register file definition
  • Instruction selection (DAG-to-DAG)
  • Instruction scheduling
  • ABI & calling conventions
  • Companies: Chip startups, embedded, RISC-V ecosystem
Resources for All Tracks
LLVM Source Code as Textbook
The best resource at this stage is the source itself. llvm/lib/Transforms/ contains all the production passes. llvm/lib/Target/X86/ is the best-commented backend. Read real code, not tutorials.
github.com/llvm/llvm-project
Contribute to LLVM
Start with good-first-issues on Phabricator/GitHub. Fix a miscompile, improve a diagnostic, add a missing peephole to instcombine. LLVM contributors are among the most rigorous code reviewers in open source — the feedback is invaluable.
Open Sourcereviews.llvm.org
General Resources

Organized Resources

Curated references organized by learning phase, from foundation concepts to advanced optimization and backend development.

Quick Reference: Common Commands & Syntax

clang Compilation

clang mycode.cCompile to executable
clang -S -emit-llvm mycode.c -o mycode.llGenerate LLVM IR (text)
clang -c -emit-llvm mycode.c -o mycode.bcGenerate bitcode (binary)
clang -O2 mycode.cOptimized compilation
clang -Xclang -emit-codegen-only mycode.cStop after codegen
clang -print-search-dirsShow include/library paths

opt - LLVM Optimizer

opt -O2 input.ll -o output.llApply O2 optimizations
opt -mem2reg input.llPromote allocas to registers
opt -passes='dce,simplifycfg' input.llRun specific passes (LLVM 14+)
opt -help | grep passList available passes
opt -analyze -cfg input.llPrint control flow graph
opt -stats input.llPrint optimization statistics

llc - Code Generator

llc input.ll -o output.sGenerate assembly
llc -march=x86-64 input.llTarget x86-64
llc -march=arm input.llTarget ARM
llc -O2 input.llOptimize during codegen
llc -view-dag-combine1-dagsVisualize DAG
llc -print-machineinstrsPrint Machine IR

LLVM IR Types

i1, i8, i16, i32, i64Integer types (bits)
float, doubleFloating point (32, 64 bit)
ptrUntyped pointer (LLVM 15+)
type*Typed pointer (legacy)
[N x type]Array of N elements
{ type1, type2 }Struct/Record

LLVM IR Instructions

%var = alloca i32Allocate on stack
store i32 %val, ptr %varWrite to memory
%val = load i32, ptr %varRead from memory
%res = add i32 %a, %bInteger add
%res = getelementptr [10 x i32], ptr %arr, i64 %idxCalculate address
call i32 @func(i32 %arg)Function call

Useful Tools

llvm-dis mycode.bc -o mycode.llDisassemble bitcode
llvm-as mycode.ll -o mycode.bcAssemble to bitcode
llvm-config --cxxflags --libsLinking flags
llvm-objdump output.oDisassemble object file
llc input.ll -print-after-allPrint IR after each pass
opt -time-passes input.llMeasure pass runtimes

Tools Installation Guide

Ubuntu/Debian

sudo apt-get install -y cmake ninja-build clang lld
git clone https://github.com/llvm/llvm-project.git
cd llvm-project/llvm && mkdir ../build && cd ../build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
ninja -j$(nproc) && sudo ninja install
clang --version

macOS (Homebrew)

brew install cmake ninja llvm
OR build from source:
git clone https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
ninja -j$(sysctl -n hw.ncpu) && sudo ninja install

Windows (MSVC)

Install Visual Studio 2019+ with C++ tools
git clone https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake -G "Visual Studio 17" -A x64 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
cmake --build . --config Release

Quick Test

echo 'int main() { return 42; }' > test.c
clang -S -emit-llvm test.c -o test.ll
opt -O2 test.ll -o test.opt.ll
llc test.opt.ll -o test.s
clang test.s -o test && ./test

FAQ & Troubleshooting

Q: How do I build LLVM from source on Linux?
git clone https://github.com/llvm/llvm-project.git && cd llvm-project && mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ../llvm && ninja && ninja install
Takes 20-40 min on modern hardware.
Q: Why is my LLVM build so slow?
Use Release build and Ninja. Add -DCMAKE_CXX_FLAGS_RELEASE="-O3" for faster builds.
Q: "undefined reference to llvm::..." when linking?
Use llvm-config --cxxflags --ldflags --libs core to get linking flags, or use CMake's find_package(LLVM REQUIRED CONFIG).
Q: How do I run opt with specific passes?
opt -O2 input.ll -o output.ll for O2 pipeline, or opt -passes='mem2reg,dce' input.ll (LLVM 14+) for specific passes.
Q: How do I generate IR from C code?
clang -S -emit-llvm mycode.c -o mycode.ll (human-readable) or clang -c -emit-llvm mycode.c -o mycode.bc (bitcode).

Common Pitfalls & Misconceptions

Phase 1: Foundation

"IR types are the same as C types"
IR i32 ≠ C int. Signedness is in operations, not types. IR pointers are untyped in LLVM 15+.
"IR is portable across all CPUs"
IR is platform-independent but must be lowered to machine code. Pointer size, alignment, and ABI assumptions still apply.
"Pass ordering doesn't matter"
Wrong. Pass ordering is critical. Always use O2/O3 pipeline as reference.

Phase 2: SSA & Passes

"SSA means each variable assigned once"
SSA means each *definition* is unique. Phi nodes handle multiple control flow paths.
"Alias analysis tells me pointers are equal"
It tells you pointers *might* alias. Always assume worst case unless NoAlias.

Phase 4: Backend

"I can ignore calling conventions"
Your code won't interop with libraries or other languages. Calling conventions are mandatory.
"Register allocation is just assigning registers"
It's an NP-hard problem. Getting it wrong kills performance.

Community & Getting Help

Official LLVM Channels

Community & Chat

Learning Pathways