LLVM Compiler 2026

Core Concepts

Why Three-Phase Compilers Win

Frontend -> Optimizer -> Backend. N languages × M targets = N+M components, not N×M. This is the entire reason LLVM exists. Read the AOSA chapter by Chris Lattner — it's 20 pages that reframe how you think about tools.

AOSA ArticleArchitecture

What is LLVM IR? (The "Virtual Assembly")

IR is a RISC-like assembly language with infinite virtual registers, strong typing, and no hardware assumptions. It's what lets the optimizer work independently of both the source language and the CPU. Your x86-64 knowledge transfers — IR is just a cleaner ISA.

IRSSA

Clang vs GCC Architecture Differences

GCC is a monolith — frontend and backend are tightly coupled. Clang is a library-based frontend that emits LLVM IR. Understand why this matters for tooling, IDE integration, and static analysis.

Compilers

Tool Mastery

clang — The Frontend

Your entry point. Key flags to know: -emit-llvm (output IR), -S (text form), -O0/-O2/-O3 (optimization levels), -c (object file). Practice: compile a C file to IR, to assembly, and to object code using only clang flags.

# C -> LLVM IR (text)
clang -S -emit-llvm -O0 factorial.c -o factorial.ll
# C -> x86-64 Assembly
clang -S -O0 factorial.c -o factorial.s
# Compare: see what changes at O2
clang -S -emit-llvm -O2 factorial.c -o factorial_opt.ll
              

clang

opt — The Optimizer

Takes .ll files, runs passes, outputs optimized .ll. Learn: -passes="mem2reg,dce", --print-changed, -stats. This is the tool you'll use most in Phase 4 when writing your own passes.

# Run mem2reg pass (promotes memory to SSA registers)
opt -passes="mem2reg" factorial.ll -S -o optimized.ll
# See what changed
opt -passes="mem2reg,dce,instcombine" -S --print-changed factorial.ll
              

opt

llc — The Backend

Converts IR to machine code. Key flags: -march (target arch), -filetype=asm vs obj. Try targeting a different arch than your host to feel what "hardware independence" actually means.

# IR -> x86-64 Assembly
llc -march=x86-64 factorial.ll -o factorial_from_ir.s
# IR -> ARM Assembly (cross-compile)
llc -march=aarch64 factorial.ll -o factorial_arm.s
              

llc

Godbolt (Compiler Explorer) Workflow

Set up a permanent workflow: left pane = C source, right pane 1 = clang x86-64, right pane 2 = clang LLVM IR. Highlight a C expression and see both outputs highlight simultaneously. This dual-map exercise will rewire how you think about code.

Godboltgodbolt.org

lli — The JIT Interpreter

Runs .ll files directly. Not for production, but essential for learning IR: write it, run it, see the result. No compilation step required.

# Write IR, run immediately
lli factorial.ll
              

lliJIT

LLVM IR Essentials

LLVM IR Syntax Cheat Sheet

Learn the bare essentials: types (i32, i64, f64), instructions (add, sub, mul, br, phi, call, ret), blocks (entry:, loop:), functions (define, declare). Read 10 small IR files and rewrite them from memory. Aim for fluency within days.

; Function signature
define i32 @add(i32 %a, i32 %b) {
entry:
  %result = add i32 %a, %b
  ret i32 %result
}
              

IRSyntax

From C to IR: Type Translation

How C types map to IR: int -> i32, long -> i64, double -> f64, struct -> {type1, type2, ...}, int* -> i32*. Pointers in IR carry type information; this is crucial for type-safe optimizations.

Types

Control Flow: Blocks and Edges

A function is a graph of basic blocks. Each block ends in a terminator (br, ret, switch). Blocks are connected by edges. This view — treating code as a CFG (Control Flow Graph) — is how optimizers see your program. Draw the CFG for a simple if-else and a loop.

; Two basic blocks
entry:
  %cond = icmp slt i32 %x, 10
  br i1 %cond, label %then, label %else
then:
  ret i32 1
else:
  ret i32 0
              

CFGControl Flow

Memory and Pointers in IR

alloca reserves stack memory. load reads from memory, store writes to it. In SSA form, memory operations break the "single assignment" rule, so early passes run mem2reg to convert them to registers. Understanding when and why memory is used is key.

; Memory operations
%ptr = alloca i32
store i32 42, i32* %ptr
%val = load i32, i32* %ptr
ret i32 %val
              

MemoryPointers

Hands-On Experiments

Experiment: IR from Simple Loop

Write a C loop that adds 1 to a variable 10 times. Generate IR at -O0 (lots of memory ops) and -O2 (optimized). Compare: What did -O2 eliminate? Why? This shows why optimization levels matter.

// C code
int sum = 0;
for (int i = 0; i < 10; i++) sum += i;
return sum;
              

clang

Experiment: IR from Function Call

Write a C file that calls a helper function. Generate IR and find the call instruction. Understand how arguments are passed, how return values work, and how calling conventions are hidden in the IR (the target-specific ABI handles them later).

clang

Experiment: Array Access in IR

Write C code: int arr[10]; arr[5] = 42; Generate IR. Understand getelementptr (GEP) — LLVM's way of computing addresses without loading/storing. This is a critical concept for optimizing array-heavy code.

; GEP: Get Element Pointer
%ptr = getelementptr i32, i32* %arr_start, i32 5
store i32 42, i32* %ptr
              

GEP

Experiment: Cross-Architecture Code Gen

Take a simple .ll file. Compile it to x86-64, ARM64, and RISC-V. See how the same IR produces different assembly. This is the entire point of the three-phase design.

# Same IR, three targets
llc -march=x86-64 prog.ll
llc -march=aarch64 prog.ll
llc -march=riscv64 prog.ll
              

llc

Understanding the Ecosystem

LLVM Project Structure

llvm/lib/IR — IR definitions. llvm/lib/Transforms — optimization passes. llvm/lib/Target — backends. clang/lib — C/C++ frontend. Don't try to understand everything; learn to navigate the source tree to find what you need.

Source Code

LLVM Passes: The Workflow

Optimization happens through passes. Each pass analyzes and transforms IR. They run in sequence (PassManager coordinates them). In Phase 1, just know what passes exist; Phase 3 you'll understand them, Phase 4 you'll write them.

PassesOptimization

CMake and LLVM Build System

LLVM uses CMake. Learn the basics: cmake -G Ninja, ninja, ninja llvm-tools. You won't need to modify the build system now, but knowing how to build LLVM will save hours of debugging later.

# Build LLVM with Ninja
mkdir build && cd build
cmake -G Ninja ..
ninja
              

CMake

Debugging Compilation: -debug Flag

LLVM tools accept -debug to print verbose pass execution info. Build LLVM with assertions enabled (CMAKE_BUILD_TYPE=Debug) and use -debug -debug-only=pass-name to spy on what specific passes do.

# See what mem2reg pass does
opt -debug-only=mem2reg -passes="mem2reg" prog.ll -S
              

Debugging

Common Pitfalls

IR ≠ Machine Code (It's Higher-Level)

Common mistake: Thinking IR is "just assembly." It's not. IR is abstract. Infinite registers. No ABI knowledge. No platform specifics. The backend's job is to fill in those gaps. Don't expect IR to look like x86-64.

Mindset

Optimization Levels Aren't Magic

-O0 disables most passes. -O2 runs mid-level optimizations. -O3 adds aggressive ones. They're just different pass pipelines. Reading what each level enables (in PassBuilder code) teaches you what tools the optimizer has.

Optimization

You Don't Need to Memorize IR Syntax

Spend a few days reading it until patterns emerge. Then refer to docs as needed. The important thing is intuition: seeing IR and predicting what it does, not perfect syntax recall.

Learning Strategy

Understanding IR Instructions

Arithmetic Instructions

add, sub, mul, udiv, sdiv, urem, srem. These are type-safe (i32 add i32 always returns i32). No overflow semantics. Learn how integer widening affects instruction choice: i32 vs i64 multiply have different semantics on overflow.

; Arithmetic operations
%a = add i32 10, 20
%b = mul i64 %x, %y
%c = sdiv i32 %a, 3   ; signed division
              

InstructionsArithmetic

Bitwise and Logical Instructions

and, or, xor, shl, lshr, ashr. These operate on bits directly. Know the difference between logical shift (lshr) and arithmetic shift (ashr) — one zero-fills, one sign-extends. These are crucial for understanding how optimizers can convert multiplication/division to shifts.

; Bitwise operations
%x = shl i32 %val, 2   ; val * 4
%y = lshr i32 %val, 1  ; val / 2 (unsigned)
%z = and i32 %a, %b     ; bitwise AND
              

InstructionsBitwise

Comparison Instructions (icmp, fcmp)

icmp compares integers with predicates (eq, ne, lt, le, gt, ge, ult, ule, ugt, uge, slt, sle, sgt, sge). fcmp compares floats with additional predicates for NaN handling. These produce i1 (boolean) values used in branches.

; Comparisons
%cond = icmp slt i32 %x, 10   ; signed less-than
%result = fcmp oeq f64 %a, %b   ; ordered equal
              

InstructionsComparison

Branch Instructions (br, switch)

br i1 condition branches to one of two labels. switch branches on value. These terminate basic blocks. Understanding control flow requires fluency with these — they define the CFG structure.

; Branching
br i1 %cond, label %then, label %else
switch i32 %val, label %default [
  i32 0, label %case0
  i32 1, label %case1
]
              

Control Flow

The Phi Node (Merging SSA Values)

The phi instruction merges SSA values from different control paths. If you branch to a block from 2 predecessors with different values, phi selects which value to use based on which predecessor you came from. This is how SSA handles variables with multiple assignments.

; Control flow merge with phi
then:
  %val_then = add i32 %x, 1
  br label %merge
else:
  %val_else = add i32 %x, 2
  br label %merge
merge:
  %result = phi i32 [%val_then, %then], [%val_else, %else]
              

SSAPhi Nodes

The LLVM C++ API Basics

Module, Function, BasicBlock Classes

Module is the root object containing Functions. Function contains BasicBlocks. BasicBlock contains Instructions. Learn the hierarchy. In Phase 1, focus on reading and understanding this structure. Phase 3 you'll write code to construct it.

C++ APIData Structures

LLVMContext: The Memory Manager

Every LLVM object (Type, Value, Instruction) needs a context. Think of it as a memory manager and type system. You can't mix objects from different contexts. In practice, you'll create one context per compilation unit.

C++ API

Iterating Over IR: User, Use, and Def

Instructions have users (things that consume their output) and operands (things they consume). Learn to traverse this: for each instruction, for each operand, get its definition, process recursively. This is how analysis passes walk the IR.

C++ APITraversal

Reading the Source: llvm/IR/Instruction.h

The best documentation for Instruction is the header file itself. Read it. See what methods are available. Understand how RTTI (dyn_cast, isa) lets you determine instruction types. This teaches you the true API.

Source Code

Analyzing Real Compiled Code

Compile Clang Itself to IR

clang -emit-llvm -c src/clang.cpp -o clang.bc then llvm-dis clang.bc to get IR. It's huge (thousands of functions). Pick one small function (say, 20 lines of C code) and find its IR equivalent. Trace through it to understand what it does.

Study Inlining: Before and After

Compile with -O0 (no inlining): see call instructions. Compile with -O2 (inlining enabled): same functions might be inlined directly. Compare the IR size. Understand why inlining matters for performance: fewer jumps, more optimization opportunities in the combined code.

Vectorization Patterns

Compile a tight loop that computes the same operation on array elements. At -O0, you'll see scalar operations. At -O3 with -march=native, the loop might vectorize: you'll see vector types (x4 i32, etc.) and vector operations (add on 4 integers at once).

Dead Code Elimination in Practice

Write C code that computes something but never uses the result. At -O0: you'll see useless instructions. At -O2: those instructions vanish (dead code elimination pass removed them). This is the first concrete optimization you should see.

// C code
int x = 5 * 10;     // never used
printf("hello");    // only this matters
              

Important Tools and Flags

llvm-dis: Disassemble IR

Convert bitcode (.bc) to text IR (.ll). Essential for reading compiled code. Bitcode is compact and efficient for the compiler, but humans need text.

llvm-dis program.bc -o program.ll

llvm-dis

llvm-as: Assemble IR

Convert text IR (.ll) back to bitcode (.bc). Lets you write IR by hand and compile it.

llvm-as program.ll -o program.bc

llvm-as

llvm-objdump: Object File Disassembly

Like objdump for compiled binaries, but understands LLVM-generated code. Useful for comparing IR optimizations to final machine code.

llvm-objdump -d program.o | less

llvm-objdump

opt -stats: Pass Statistics

See how many optimizations each pass performed. Useful for understanding which passes are active at each -O level and how much they transform the code.

opt -O2 -stats program.ll -o /dev/null

opt

llvm-config: Build Information

Query your LLVM installation: version, flags, paths. Essential when writing tools that use the C++ API.

llvm-config --version
llvm-config --cxxflags
llvm-config --ldflags
              

llvm-config

Core Compiler Terminology

Basic Block: The Atom of Optimization

A sequence of instructions with no branches in the middle. Branches only appear at the end. Every function is a graph of basic blocks. Optimizations work at basic block level (local) or across blocks (global/interprocedural).

Terminology

Dominance: B Dominates A if Every Path to A Goes Through B

Critical for optimization. If B dominates A, code in B always executes before A. Used for hoisting invariant code, dead code elimination, and more. Dominance trees are a core data structure in LLVM analysis.

TerminologyAnalysis

Live Variables: Which Values Are Actually Used

A value is live at a point if it might be used on some path forward. If a value is not live, its computation can be eliminated (dead code). Live variable analysis is fundamental to many optimizations.

TerminologyAnalysis

Reaching Definitions: Which Assignment Reaches Here

For a variable use at point P, which assignment definition could have produced the current value? In SSA, this is trivial (SSA guarantees exactly one reaching definition per use). In traditional code, reaching definitions analysis is complex.

TerminologyAnalysis

Loop-Invariant Code Motion (LICM)

Move computations that don't change inside a loop to before the loop. Requires dominance, control dependence, and data dependence analysis. A classic optimization that dramatically improves loop performance.

TerminologyOptimization

Official LLVM

Foundational Papers

Interactive Tools & Exploration

Compiler Theory & Architecture

IR Design & Optimization Theory

Clang Frontend & Language Binding

02

Months 2-3 Core Language

Fluency in LLVM IR

Goal: Read and write LLVM IR by hand. SSA form must become second nature. You should be able to look at IR and know exactly what it does without running it.

Key Takeaways

Passes transform IR in-place; composition of passes = optimization pipeline
SSA simplifies analysis: use-def chains enable powerful algorithms
Dominance relationships are fundamental to control flow analysis
Alias analysis determines what memory locations can overlap
Write your first pass and run it on real programs

SSA Form — The Core Idea

Static Single Assignment (SSA)

Every variable is assigned exactly once. If you need to reassign, you create a new variable. This sounds crazy at first, but it makes data-flow analysis trivially simple. Compilers can prove "this value never changes" without complex analysis.

; Before SSA (pseudocode)
x = 1
x = x + 2   ; x reassigned
; After SSA
%x1 = add i32 0, 1
%x2 = add i32 %x1, 2   ; new name
              

SSAData Flow

Phi Nodes — Handling Branches

The one hard part of SSA. When control flow merges (after an if/else), which version of a variable do you use? The phi node selects based on which basic block you came from. Critical to understand before Phase 4.

; if (cond) { x = 1; } else { x = 2; }
%result = phi i32 [ 1, %if.true ], [ 2, %if.false ]
              

Phi NodesCFG

mem2reg Pass — Why alloca Exists

Clang at -O0 doesn't generate SSA directly — it generates alloca/store/load. The mem2reg pass promotes these memory operations to SSA registers. This is why unoptimized IR looks messy: it's designed to be simple to generate, then cleaned up.

mem2regopt

IR Syntax & Types

Type System

IR is strongly typed. Master: i1, i8, i32, i64 (integers), float, double, ptr (opaque pointer — modern LLVM), [N x T] (arrays), {T1, T2} (structs), T(T1, T2)* (function pointers).

TypesLangRef

Core Instruction Set

Must know: alloca (stack), load/store, add/sub/mul/sdiv, icmp/fcmp, br (conditional and unconditional), call/ret, getelementptr (GEP — pointer arithmetic), bitcast/trunc/zext/sext.

; Simple function: int add(int a, int b)
define i32 @add(i32 %a, i32 %b) {
  %result = add i32 %a, %b
  ret i32 %result
}
              

Instructions

getelementptr (GEP) — The Tricky One

GEP is pointer arithmetic, not a memory access. It calculates an address — it does NOT dereference. This trips up everyone. The first index dereferences the pointer type, subsequent indices navigate struct/array fields. Study at least 5 examples before moving on.

; Access arr[3] where arr is [10 x i32]*
%ptr = getelementptr [10 x i32], ptr %arr, i64 0, i64 3
              

GEPPointers

Basic Blocks & Control Flow Graph (CFG)

A function is a collection of basic blocks. Each block ends with a terminator (br or ret). Edges between blocks form the CFG. Every optimization in LLVM operates on this graph structure. Draw CFGs by hand for small functions.

CFGBasic Blocks

LLVM LangRef — Your Dictionary

Don't read it cover to cover. Learn to navigate it. Every time you see an instruction you don't know, look it up. Bookmark the sections: Type System, Instruction Reference, Intrinsics. You'll consult this daily.

llvm.org/docs/LangRef.html

SSA & Analysis

03

Months 4-5 First Compiler

The Kaleidoscope Rite

Goal: Build a real compiler end-to-end. Text in -> JIT execution out. Use the official LLVM Kaleidoscope tutorial as your guide, but understand every line — don't copy-paste.

Key Takeaways

Kaleidoscope teaches language design, not just code generation
Lexer -> Parser -> AST -> Codegen is the compiler pipeline
Type checking and semantic analysis must happen in the frontend
Error recovery and diagnostic messages are part of a real frontend
Clang architecture shows how production frontends are structured

Compiler Frontend

Lexing (Tokenization)

Convert raw text into a stream of tokens (keywords, identifiers, numbers, operators). Write a hand-rolled lexer — no lex/flex. Kaleidoscope's lexer is ~100 lines of C++. Key: every token needs a type and a value.

// "def foo(x) x + 1"  ->
[ TOKEN_DEF, IDENT("foo"), LPAREN,
  IDENT("x"), RPAREN, IDENT("x"),
  PLUS, NUMBER(1) ]
              

LexerTokens

Parsing — Recursive Descent

Convert tokens into an AST. Recursive descent is the industry standard for hand-written parsers. Each grammar rule becomes a function. Kaleidoscope uses Pratt parsing for expressions (precedence climbing). Learn both.

ParserASTPratt Parsing

AST Node Design

Design your AST classes: ExprAST (base), NumberExprAST, VariableExprAST, BinaryExprAST, CallExprAST, FunctionAST. Each node must have a codegen() method that returns an llvm::Value*. This interface is the bridge between frontend and backend.

AST Design

LLVM C++ API — Code Generation

The Big Three: Context, Module, IRBuilder

LLVMContext owns all IR objects. Module is a compilation unit (a .ll file in C++ form). IRBuilder is your "cursor" — it tracks the current insertion point and has methods for every instruction. You'll use these in every compiler you ever write.

auto TheContext = std::make_unique<llvm::LLVMContext>();
auto TheModule = std::make_unique<llvm::Module>("kaleid", *TheContext);
auto Builder = std::make_unique<llvm::IRBuilder<>>(*TheContext);
              

LLVM C++ API

IRBuilder Methods for Every Construct

Map each AST node to IRBuilder calls: CreateAdd/Sub/Mul, CreateFCmpOLT (float compare), CreateBr/CreateCondBr, CreateCall, CreateRet, CreateAlloca, CreateStore/Load. Practice until you can translate IR syntax to C++ API calls without looking it up.

IRBuilder

Function & BasicBlock Creation

Function::Create() with FunctionType. BasicBlock::Create() and setting the IRBuilder insertion point with SetInsertPoint(). Handle function arguments with func->arg_begin(). This pattern repeats in every backend codegen.

FunctionsBasicBlocks

JIT Compilation

ORC JIT (LLVM's Modern JIT API)

ORC (On-Request Compilation) is LLVM's composable JIT framework. Key layers: IRCompileLayer, RTDyldObjectLinkingLayer. For Kaleidoscope, you add functions to the JIT incrementally as the user types them. This mirrors how REPLs work.

ORC JITREPL

Symbol Resolution & Extern Functions

JIT needs to find symbols (like sin, printf). Learn how symbol lookup works: JIT -> process symbols -> stdlib. Implement extern declarations in your language so Kaleidoscope can call C functions.

Symbol Resolution

Pass Development

04

Months 6-9 Job-Ready Phase

Writing Passes & Optimizations

Goal: Write code that transforms other code. This is what compiler engineers do at companies. Master the pass infrastructure, data structures, and analysis frameworks.

Key Takeaways

MIR (Machine IR) bridges LLVM IR and actual assembly
Instruction selection, register allocation, and scheduling are hard problems
TableGen generates instruction definitions and patterns automatically
Calling conventions and ABI are critical for correctness
Writing a backend requires understanding your target architecture deeply

New Pass Manager (NPM)

New Pass Manager Architecture

LLVM switched from the Legacy Pass Manager to the New Pass Manager (NPM) in LLVM 13+. All new code uses NPM. Key types: FunctionPass, ModulePass, LoopPass. Pass registration with PassPluginLibraryInfo for out-of-tree passes.

NPMPass Manager

Pass Anatomy: run() Method

Every pass implements a run(Function &F, FunctionAnalysisManager &AM) method. It returns PreservedAnalyses — telling the pass manager which analyses are still valid after your transformation. Returning PreservedAnalyses::all() vs none() has performance implications.

struct MyPass : PassInfoMixin<MyPass> {
  PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM) {
    // walk instructions, transform, ...
    return PreservedAnalyses::none();
  }
};
              

Pass Anatomy

Analysis Passes

Iterating the CFG

Walk functions, basic blocks, and instructions. Pattern: for (BasicBlock &BB : F), for (Instruction &I : BB). Learn to use dyn_cast<BranchInst>(&I) for instruction type checking. This is boilerplate you'll type hundreds of times.

CFG Traversal

Dominance Analysis

A block A dominates B if every path to B goes through A. Crucial for: finding where to hoist code, validating SSA, loop analysis. Access via DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F).

DominanceDomTree

Alias Analysis

Can two pointers point to the same memory? If not, loads/stores can be reordered. AliasAnalysis returns MustAlias, MayAlias, or NoAlias. Fundamental for auto-vectorization and memory-level parallelism.

Alias Analysis

Dead Code Analysis Pass

Write a pass that identifies unreachable basic blocks (blocks with no predecessors in the CFG, excluding the entry block) and instructions with no users that also have no side effects. Print a report. This teaches you CFG traversal + use-def chains.

Build This

Transform Passes

Instruction Combining (instcombine)

The most important optimization pass. Replaces instruction patterns with cheaper equivalents. Examples: x * 2 -> x << 1, x + 0 -> x, x / 2 -> x >> 1 (unsigned). Study the existing instcombine source — it's 50k lines of patterns and teaches you how to think about transformations.

; Before instcombine
%r = mul i32 %x, 2
; After instcombine
%r = shl i32 %x, 1
              

instcombine

Dead Code Elimination (DCE/ADCE)

Remove instructions whose results are never used. ADCE (Aggressive DCE) also removes unreachable blocks. Implement a basic DCE: iterate instructions in reverse, if I.use_empty() and instruction has no side effects, erase it.

DCEBuild This

Loop Transformations

LICM (Loop-Invariant Code Motion): if a computation's inputs don't change in a loop, hoist it outside. Loop Unrolling: replicate loop body N times to reduce branch overhead. Loop Vectorization: convert scalar loops to SIMD. These are where most performance gains come from in real-world code.

LICMUnrollingVectorization

Strength Reduction Pass

Write a pass that finds multiply-by-power-of-2 patterns (mul i32 %x, 4) and replaces them with shifts (shl i32 %x, 2). Extend to handle division. Test with opt --load-pass-plugin=./libMyPass.so -passes="strength-reduce".

Build This

Inlining

Replace a function call with the function body. Eliminates call overhead and enables further optimizations across call boundaries. Learn the inlining cost model: LLVM uses a heuristic based on instruction count. InlineFunction() utility in LLVM API.

Inlining

LLVM Data Structures (ADT Library)

SmallVector, SmallString, ArrayRef

SmallVector<T, N>: vector with N elements on the stack before heap allocation. Avoids allocations for small collections (most compiler data). ArrayRef: non-owning reference to any array-like container. Use these everywhere in compiler code — never raw std::vector for IR-level work.

ADTSmallVector

StringRef, Twine

StringRef: non-owning reference to a string — zero-copy. Twine: lazy string concatenation tree. Never allocate intermediate strings in hot paths. Profilers hate std::string in compilers.

StringRefTwine

DenseMap, DenseSet

Hash maps/sets optimized for pointer keys (common in IR — Value*, BasicBlock*). Much faster than std::unordered_map for small-to-medium sizes due to open addressing and cache locality.

DenseMap

Use-Def Chains

Every Value in LLVM has a list of uses. Iterate with for (Use &U : V.uses()). Replace all uses with V.replaceAllUsesWith(NewV). This is the foundation of every transformation — finding what uses what.

Use-Def

Code Generation

05

Month 10+ Specialization

Choose Your Track

Goal: Deep mastery in one high-value niche. All three tracks are hiring. Pick based on what excites you — motivation beats market research at this level.

Key Takeaways

Vectorization exploits SIMD parallelism; SLP and loop vectorization differ
GlobalISel offers alternative instruction selection (modern approach)
Garbage collection metadata and statepoints handle managed languages
Profiling-guided optimization uses runtime data for better decisions
Specialize in your interest: LTO, PGO, MLIR, debugging, or custom domains

Track A AI Hardware

MLIR & AI Compilers

MLIR dialects (Linalg, Affine, Vector, GPU)
Lowering pipelines: PyTorch/XLA -> MLIR -> LLVM IR
Tiling and fusion for tensor ops
Writing a custom MLIR dialect
Polyhedral optimization model
Target: IREE, ONNX-MLIR, Triton (GPU)
Companies: Google, Apple (Core ML), Nvidia (Triton)

Track B Security

Obfuscation & Binary Analysis

Control Flow Flattening pass
Instruction substitution (replace ops with equiv)
Bogus control flow injection
String encryption passes
LLVM-based sanitizers (ASan, UBSan internals)
Study: OLLVM, Hikari, o-llvm
Companies: Security firms, game anti-cheat, DRM

Track C New Hardware

Custom Backend / CPU

TableGen — describing instructions declaratively
Target machine classes hierarchy
Register file definition
Instruction selection (DAG-to-DAG)
Instruction scheduling
ABI & calling conventions
Companies: Chip startups, embedded, RISC-V ecosystem

Resources for All Tracks

LLVM Source Code as Textbook

The best resource at this stage is the source itself. llvm/lib/Transforms/ contains all the production passes. llvm/lib/Target/X86/ is the best-commented backend. Read real code, not tutorials.

github.com/llvm/llvm-project

Contribute to LLVM

Start with good-first-issues on Phabricator/GitHub. Fix a miscompile, improve a diagnostic, add a missing peephole to instcombine. LLVM contributors are among the most rigorous code reviewers in open source — the feedback is invaluable.

Open Sourcereviews.llvm.org

Advanced Techniques

Official LLVM

Tools & Commands Reference

ABI & Architecture

Academic Papers & Theory

Related Projects

Books & Learning Resources

Quick Reference: Common Commands & Syntax

clang Compilation

`clang mycode.c`	Compile to executable
`clang -S -emit-llvm mycode.c -o mycode.ll`	Generate LLVM IR (text)
`clang -c -emit-llvm mycode.c -o mycode.bc`	Generate bitcode (binary)
`clang -O2 mycode.c`	Optimized compilation
`clang -Xclang -emit-codegen-only mycode.c`	Stop after codegen
`clang -print-search-dirs`	Show include/library paths

opt - LLVM Optimizer

`opt -O2 input.ll -o output.ll`	Apply O2 optimizations
`opt -mem2reg input.ll`	Promote allocas to registers
`opt -passes='dce,simplifycfg' input.ll`	Run specific passes (LLVM 14+)
`opt -help \| grep pass`	List available passes
`opt -analyze -cfg input.ll`	Print control flow graph
`opt -stats input.ll`	Print optimization statistics

llc - Code Generator

`llc input.ll -o output.s`	Generate assembly
`llc -march=x86-64 input.ll`	Target x86-64
`llc -march=arm input.ll`	Target ARM
`llc -O2 input.ll`	Optimize during codegen
`llc -view-dag-combine1-dags`	Visualize DAG
`llc -print-machineinstrs`	Print Machine IR

LLVM IR Types

`i1, i8, i16, i32, i64`	Integer types (bits)
`float, double`	Floating point (32, 64 bit)
`ptr`	Untyped pointer (LLVM 15+)
`type*`	Typed pointer (legacy)
`[N x type]`	Array of N elements
`{ type1, type2 }`	Struct/Record

LLVM IR Instructions

`%var = alloca i32`	Allocate on stack
`store i32 %val, ptr %var`	Write to memory
`%val = load i32, ptr %var`	Read from memory
`%res = add i32 %a, %b`	Integer add
`%res = getelementptr [10 x i32], ptr %arr, i64 %idx`	Calculate address
`call i32 @func(i32 %arg)`	Function call

Useful Tools

`llvm-dis mycode.bc -o mycode.ll`	Disassemble bitcode
`llvm-as mycode.ll -o mycode.bc`	Assemble to bitcode
`llvm-config --cxxflags --libs`	Linking flags
`llvm-objdump output.o`	Disassemble object file
`llc input.ll -print-after-all`	Print IR after each pass
`opt -time-passes input.ll`	Measure pass runtimes

Tools Installation Guide

Ubuntu/Debian

sudo apt-get install -y cmake ninja-build clang lld
git clone https://github.com/llvm/llvm-project.git
cd llvm-project/llvm && mkdir ../build && cd ../build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
ninja -j$(nproc) && sudo ninja install
clang --version

macOS (Homebrew)

brew install cmake ninja llvm
OR build from source:
git clone https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
ninja -j$(sysctl -n hw.ncpu) && sudo ninja install

Windows (MSVC)

Install Visual Studio 2019+ with C++ tools
git clone https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake -G "Visual Studio 17" -A x64 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" ../llvm
cmake --build . --config Release

Quick Test

echo 'int main() { return 42; }' > test.c
clang -S -emit-llvm test.c -o test.ll
opt -O2 test.ll -o test.opt.ll
llc test.opt.ll -o test.s
clang test.s -o test && ./test

FAQ & Troubleshooting

Q: How do I build LLVM from source on Linux?

git clone https://github.com/llvm/llvm-project.git && cd llvm-project && mkdir build && cd build
cmake -G Ninja -DCMAKE_BUILD_TYPE=Release ../llvm && ninja && ninja install
Takes 20-40 min on modern hardware.

Q: Why is my LLVM build so slow?

Use Release build and Ninja. Add -DCMAKE_CXX_FLAGS_RELEASE="-O3" for faster builds.

Q: "undefined reference to llvm::..." when linking?

Use llvm-config --cxxflags --ldflags --libs core to get linking flags, or use CMake's find_package(LLVM REQUIRED CONFIG).

Q: How do I run opt with specific passes?

opt -O2 input.ll -o output.ll for O2 pipeline, or opt -passes='mem2reg,dce' input.ll (LLVM 14+) for specific passes.

Q: How do I generate IR from C code?

clang -S -emit-llvm mycode.c -o mycode.ll (human-readable) or clang -c -emit-llvm mycode.c -o mycode.bc (bitcode).

Common Pitfalls & Misconceptions

Phase 1: Foundation

"IR types are the same as C types"
IR i32 ≠ C int. Signedness is in operations, not types. IR pointers are untyped in LLVM 15+.

"IR is portable across all CPUs"
IR is platform-independent but must be lowered to machine code. Pointer size, alignment, and ABI assumptions still apply.

"Pass ordering doesn't matter"
Wrong. Pass ordering is critical. Always use O2/O3 pipeline as reference.

Phase 2: SSA & Passes

"SSA means each variable assigned once"
SSA means each *definition* is unique. Phi nodes handle multiple control flow paths.

"Alias analysis tells me pointers are equal"
It tells you pointers *might* alias. Always assume worst case unless NoAlias.

Phase 4: Backend

"I can ignore calling conventions"
Your code won't interop with libraries or other languages. Calling conventions are mandatory.

"Register allocation is just assigning registers"
It's an NP-hard problem. Getting it wrong kills performance.

Community & Getting Help

Official LLVM Channels

LLVM Discourse Forum — Official discussions
GitHub Issues — Bug reports and features
LLVM Developers' Meeting — Annual conference
Contributing Guide — How to submit patches

Community & Chat

LLVM Discord — Real-time chat
Stack Overflow [llvm] tag — Q&A
r/Compilers subreddit — Discussion
Hacker News — Tech discussion

Learning Pathways

Kaleidoscope Tutorial — Official guide
edX Compiler Courses — Academic courses
Coursera — Compiler courses
Crafting Interpreters — Free book

LLVM Curriculum

Navigation

IR Foundations & Core Concepts

Official LLVM

Foundational Papers

Interactive Tools & Exploration

Compiler Theory & Architecture

IR Design & Optimization Theory

Clang Frontend & Language Binding

SSA Form & Pass Infrastructure

SSA & Analysis

Writing Custom Passes

Pass Development

Backend Development & Code Generation

Code Generation

Advanced Optimization & Specialization

Advanced Techniques

Organized Resources

Official LLVM

Tools & Commands Reference

ABI & Architecture

Academic Papers & Theory

Related Projects

Books & Learning Resources

Quick Reference: Common Commands & Syntax

clang Compilation

opt - LLVM Optimizer

llc - Code Generator

LLVM IR Types

LLVM IR Instructions

Useful Tools

Tools Installation Guide

Ubuntu/Debian

macOS (Homebrew)

Windows (MSVC)

Quick Test

FAQ & Troubleshooting

Common Pitfalls & Misconceptions

Phase 1: Foundation

Phase 2: SSA & Passes

Phase 4: Backend

Community & Getting Help

Official LLVM Channels

Community & Chat

Learning Pathways

LLVM
Curriculum