Traveling Salesman¶

Given pairwise distances among n cities, find the shortest tour. NP-hard in theory, routinely solved to provable optimum at n≈10⁵ in practice. The canonical example of worst-case complexity telling you almost nothing about average-case reality.

🌱 seedling tended 2026-05-16 math algorithms complexity optimization

flowchart LR
  cities[n cities + distances] --> tour[permutation π]
  tour --> cost["Σ d(π_i, π_{i+1})"]
  cost --> min[minimise]

1. The Problem, Tightly¶

Input. A complete weighted graph K_n with edge weights d : E → ℝ_{≥0}.

Output. A Hamiltonian cycle C minimising Σ_{(i,j) ∈ C} d(i,j).

Variants by structure on d:

Name	Constraint on d	Hardness of approximation
General TSP	none	not approximable within any constant unless P=NP
Metric TSP	d(i,j) ≤ d(i,k) + d(k,j)	1.5 − 10⁻³⁶ (KKG 2020)
Euclidean TSP	cities in ℝ², d = L²	PTAS (Arora 1996, Mitchell 1996)
Asymmetric TSP	d(i,j) ≠ d(j,i) allowed	O(1)-approximable (Svensson–Tarnawski–Végh 2018)
(1,2)-TSP	d ∈	8/7-approximable

The metric assumption is the cliff edge: general TSP is inapproximable because you can encode Hamiltonian-cycle decision into it with infinite penalties; once triangle inequality is back, every shortcut is at most as long as the path it bypasses, and approximation becomes possible.

2. Why It's Hard (Theory)¶

Three independent sources of hardness, each blocking a natural attack:

Exponential search space. (n−1)!/2 distinct tours. Brute force is dead at n ≈ 15.

Held-Karp DP is exponential. Define OPT(S, j) = shortest path from 1 through exactly S ending at j. Recurrence is O(n² · 2ⁿ) time, O(n · 2ⁿ) space. Best known exact worst-case algorithm — and the exponent in n is believed essential under the Exponential Time Hypothesis.

The natural ILP has exponentially many constraints. Variables x_{ij} ∈ {0,1}, degree-2 at every vertex. To forbid subtours you need a constraint for every subset S ⊊ V:

Σ_{i ∈ S, j ∉ S} x_{ij} ≥ 2 for all ∅ ⊊ S ⊊ V

That's 2ⁿ − 2 constraints. The LP relaxation also has fractional vertices (half-integral solutions on the Petersen graph), so even rounding is nontrivial.

3. Why It's Easy (Practice)¶

Cutting planes turn 2ⁿ constraints into a few thousand. You don't enumerate subtour constraints — you find violated ones on demand using min-cut (Padberg–Rinaldi 1991). Concorde solves the LP relaxation, finds the most violated cut, adds it, repeats. Branch only when no cut tightens the LP. This is branch-and-cut and it is the reason exact TSP works.

Lin-Kernighan is sequential 2-opt with backtracking. A 2-opt move removes two edges and reconnects to form a new tour. LK chains these: remove an edge, add an edge, remove an edge, add an edge, ... — keeping a running gain and accepting the sequence only if the cumulative gain is positive. LKH (Helsgaun's variant) adds 5-opt and α-nearness candidate lists. On random Euclidean instances it produces tours within 0.05% of optimum at n = 10⁶.

Christofides–Serdyukov gives 1.5-approximation for metric TSP in polynomial time: 1. Build a minimum spanning tree T. 2. Find a minimum-weight perfect matching M on the odd-degree vertices of T (Edmonds' algorithm). 3. T ∪ M is Eulerian; shortcut the Euler tour to a Hamiltonian one.

The 1.5 bound stood from 1976 to 2020, when Karlin, Klein, and Oveis Gharan shaved it by 10⁻³⁶ using randomised half-integral relaxation rounding — the first improvement in 44 years. The actual constant doesn't matter; the proof of some improvement does, because it cracked a stuck question.

4. The Algorithm Map¶

flowchart TD
  P[TSP instance]
  P --> E[exact]
  P --> H[heuristic]
  P --> A[approximation]

  E --> HK[Held-Karp DP<br/>O(n² 2ⁿ), n ≲ 25]
  E --> BC[Branch-and-cut / Concorde<br/>n ≲ 10⁵ in practice]

  H --> NN[nearest neighbour<br/>~25% over optimum]
  H --> O2[2-opt / 3-opt<br/>~5% over optimum]
  H --> LK[Lin-Kernighan / LKH<br/>~0.05% over optimum]
  H --> EA[GA, ACO, SA<br/>variable, often worse than LKH]

  A --> CH[Christofides<br/>1.5 · OPT, metric only]
  A --> KKG[KKG 2020<br/>1.5 − 10⁻³⁶, metric]
  A --> PTAS[Arora PTAS<br/>(1+ε) · OPT, Euclidean]

5. Reductions That Matter¶

ATSP → STSP. Replace each vertex v with three vertices v_in, v_mid, v_out connected by a path with very negative weights; original edges attach to v_in and v_out. A symmetric tour in the expanded graph encodes a direction in the original. Cost: 3× the vertices.
TSP → Hamiltonian Cycle. Set d(i,j) = 1 for edges in G, d(i,j) = 2 otherwise. Tour of length n iff G is Hamiltonian.
Hamiltonian Cycle → TSP is what makes TSP NP-hard.
TSP path → TSP cycle. Add a dummy vertex with distance 0 to all others.

6. The Lesson¶

The reason TSP is tractable in practice is structure in the input. Real instances — chip routing, drilling, vehicle routing — have geographic clustering, low intrinsic dimension, and a Lipschitz-ish distance function. Under such inputs, the LP relaxation is nearly integral, cuts close the gap quickly, and 2-opt's landscape has few deep local minima.

The pathological inputs that force exponential behaviour (random high-dimensional distance matrices with no metric structure) do not occur in production.

This is the general shape of much of applied algorithmics: worst-case complexity is a property of the input distribution you assume, not of the problem itself. SAT, ILP, graph isomorphism, and TSP all have the same shape — undecidable or NP-hard in the lab, routinely solved at industrial scale. Linear programming had it too, until Khachiyan (1979) and Karmarkar (1984) closed the theory-practice gap from the theory side.

The open frontier for TSP is the same as for the rest: characterise the input distributions under which polynomial-time solvers provably succeed, rather than the worst case under which they provably fail. Smoothed analysis (Spielman–Teng 2001, originally for simplex) is the cleanest extant framework.

7. Numbers Worth Remembering¶

Instance	n	First solved to optimum	Solver	CPU time
dantzig42	42	1954	DFJ by hand	days (people)
gr120	120	1977	Grötschel	hours
pr2392	2,392	1987	Padberg–Rinaldi	hours
usa13509	13,509	1998	Concorde	weeks (cluster)
sw24978	24,978	2004	Concorde	months (cluster)
pla85900	85,900	2006	Concorde	136 CPU-years

The gap from 42 to 85,900 in 52 years is ~12% growth per year — close to Moore's Law on the hardware side, but with algorithmic improvements contributing at least as much. Concorde with 1986 hardware would still beat 1986 Concorde with 2006 hardware.