LaTeX Runs on Top of TeX

LaTeX Runs on Top of TeX

People often think of TeX and LaTeX as two rival typesetting programs, like Word versus some competitor — or assume LaTeX is the latest version of TeX. Neither is right. LaTeX runs on top of TeX. TeX is the foundation, and LaTeX is a layer laid over it.

Get this one relationship and the surrounding names — pdflatex, .fmt, and the rest — fall into line one after another. Miss it, and those names stay disconnected words forever. So let's start with what TeX actually is.

TeX is a typesetting engine and a macro language

TeX was made by Donald Knuth in the late 1970s. The trigger was small. He got the second-edition proofs of his book The Art of Computer Programming and found the typesetting worse than the first edition. The industry was mid-transition from hot-metal to phototypesetting, and the new process couldn't handle a math-heavy book — no system of the day could keep the size and spacing of fractions, subscripts, and integral signs consistent across a whole volume. What he set out to build was not a word processor but typesetting at the level a master once achieved by hand.

That goal split TeX's structure in two. Mathematical notation differs by field and by person — one mathematician invents a new operator, one physicist uses their own matrix notation — so Knuth couldn't predefine every notation in the program. So TeX was built as two parts.

One is the typesetting engine: the mass of algorithms that take characters and symbols, turn them into boxes, and work out how to place those boxes across lines and pages. It does only very low-level work. The other is the macro-expansion language: write \def\foo{...} and from then on \foo is replaced by that content — a way to define your own commands. Rather than solving the notation problem head-on, Knuth handed the very power to define notation over to the user. Processing a TeX file, in the end, means expanding macros all the way down and feeding the remaining primitive commands to the engine.

The name carries this character too. TeX isn't an English word but the first three letters of the Greek τέχνη (technē). technē is the word from before craft and art split apart. So the final X isn't the Latin "ex" but the Greek letter chi (χ), and it's pronounced closer to the ch in German ach — "tekh," not "teks." The lowered middle E in the logo is the same stubbornness.

Raw TeX is hard to write in, so formats appeared

The decision that the engine handles only the low level immediately creates friction. Writing with nothing but raw TeX commands is like building an app in assembly. "12pt bold here, 14pt above, next line with no indent" has to be hand-typed every time. Opening a single chapter heading takes dozens of command lines.

Knuth knew this. So he predefined a bundle of frequently-used macros and shipped it alongside — that's plain TeX. Here a concept takes shape: the macro bundle laid on top of the engine is called a format. Leave the engine alone, and which macro set you put on top completely changes the experience. plain TeX is the thin base bundle Knuth laid down himself.

But plain TeX still operated at the level of touching typographic detail. For someone who thinks "I don't want to care what point size the type prints at — this is a heading, this is a section, this is a quote, that's all I want to say," it fell short. It had no vocabulary for pointing at the structure of the writing.

LaTeX is that format, stacked thick

The person who filled that gap was Leslie Lamport. In the early 1980s he stacked a heap of structure-oriented macros on top of plain TeX, bundled them, and stamped them with his own name. La(mport) + TeX, hence LaTeX. (The trailing TeX is still read "tekh.")

The core of what LaTeX does is to separate concerns. The writer declares only "what something is," as in \documentclass{article}, \section{Introduction}, \begin{itemize}. At what point size, with what spacing, in what typeface it actually prints is decided behind the scenes by the document class and packages. It split the structure of the writing from its appearance.

Here the relationship from the start lines up.

TeX    = engine (typesetting algorithms) + macro language   ← foundation
plain  = thin macro bundle on TeX (Knuth)                   ← light format
LaTeX  = structure-oriented macro bundle on TeX (Lamport)   ← thick format

So LaTeX is not a competitor of TeX, not its next version, not a different language. It's just the most popular among the macro packages running on the TeX engine. Even the \section{Introduction} we use daily is, in the end, a macro Lamport defined with \def; calling it unfolds dozens of primitive commands that number it, enlarge the type, add spacing, and register it in the table of contents. "LaTeX is mostly TeX macros, really" means exactly this.

This separation isn't free. Macros call macros, package B overrides a command package A defined, and a document class adds yet another layer on top. When some undefined command gets called, Undefined control sequence appears — and the spot it points to is not the line you wrote but some place far below, where the macro expansion broke. Stack an abstraction and it's convenient day to day, but when something goes wrong you have to dig back down through its full thickness. LaTeX errors are notorious not because LaTeX is badly made, but because that's how a thick abstraction is.

So pdflatex, .fmt, and the several engines fall into place too

Once you hold onto "TeX is the foundation, LaTeX a layer on top," the names that confused you all resolve into the story between these two layers.

First, why there are several engines. Knuth's original TeX engine was designed around 1980s computers, so its limits showed as the world moved on. The original handles characters as 8-bit (256 slots), leaving no room for Korean or Han characters, and can't directly read the fonts installed on your OS. XeTeX solved this: it takes Unicode input and calls system fonts by name (\setmainfont{Times New Roman}). The original also predates PDF, so it emits an intermediate file called DVI (DeVice Independent); once the world ran on PDF, pdfTeX was the fix that emits PDF directly. There's also LuaTeX, which embeds a Lua scripting engine to ease the pain of writing complex logic in macros alone. The three engines aren't a product list to memorize but descendants that each patched one weakness of the original TeX.

Next, the .fmt file. LaTeX macro definitions run to thousands of lines, so reading them from scratch on every compile repeats the same setup each time. So right after the macros are fully read in, the engine's memory state is dumped whole to a file. That's the .fmt file (pdflatex.fmt and so on); from then on it loads this snapshot instantly, with no re-reading of thousands of lines.

Put these two together and the command names resolve. pdflatex is not a standalone program but "the pdfTeX engine, launched into the .fmt state with the LaTeX macros preloaded" — a combination of engine and format.

pdflatex  =  pdfTeX engine  +  LaTeX format
xelatex   =  XeTeX  engine  +  LaTeX format
lualatex  =  LuaTeX engine  +  LaTeX format

The upper layer (LaTeX) stays the same; only the lower one (the engine) changes. Switching pdflatex to xelatex when Korean won't render works because it leaves the LaTeX on top untouched and swaps only the bottom engine for a Unicode-capable one. So the two aren't better-or-worse but fit-for-purpose: use xelatex (or lualatex) for Korean or system fonts, pdflatex for mostly-English work where speed matters.

What the foundation actually does

Finally, what shows most clearly why TeX is the foundation and LaTeX a layer on top is the typesetting computation the engine actually performs. LaTeX only says "what something is"; the real work of placing it well is all done by the engine.

The engine sees everything as a box — a character is a box, a word is a box, a line is a box. The whitespace between boxes isn't a fixed blank but glue: a gap with a stretch range, "natural size plus how much it can stretch and how much it can shrink." Inter-word space isn't simply 4pt but, say, "4pt by default, stretching to 2pt, shrinking to 1pt." This makes justification natural: to fit a line flush to both margins you widen the word spacing, and the glues each stretch proportionally so the gaps open evenly.

The real difference is in line breaking. A typical word processor fills a line from the left and bumps a word that won't fit to the next line. Deciding line by line is fast, but it yields uneven paragraphs — one line tight, the next loose. TeX looks at the whole paragraph at once. For every possible set of break points it scores each line's badness — higher the more a line's glue is over-stretched or over-shrunk — and picks the set minimizing the paragraph's total. It's the algorithm Knuth and Michael Plass devised. So if a line is about to get too tight, TeX loosens an earlier line slightly to even out the whole paragraph. The opposite of the uneven page Knuth couldn't stand in those proofs is exactly this global optimization.

Set out the whole path from a .tex file to a PDF, and it looks like this.

.tex source       human-written high-level commands like \section{...}, $x^2$
   │
[macro expansion] load the LaTeX macro layer instantly from the .fmt snapshot, unfold
   │
[engine setting]  pdfTeX/XeTeX/LuaTeX — characters into boxes, whitespace into glue
   │
[line breaking]   look at the whole paragraph, pick breaks minimizing total badness
   │
[page breaking]   place the lines onto pages
   │
[backend output]  DVI or direct PDF, depending on the engine
   ▼
PDF

The top two boxes are LaTeX's (the macros) and the rest is the engine's. In the end, TeX is the typesetting engine that places characters beautifully, and LaTeX is one macro layer laid on top that lets you say "this is a heading, this is a section." pdflatex, .fmt, the several engines — all of them are names that arose in between these two layers.