Week 5 · Part 2
Duality Theory of Semidefinite Programming and Its Applications
Symmetric matrices: $\mathbb{S}^n = \{ X \in \mathbb{R}^{n \times n} : X = X^\top \}$.
Trace inner product: $\langle A, B \rangle := \mathrm{Tr}(AB)$ for $A,B \in \mathbb{S}^n$.
Positive semidefinite cone: $\mathbb{S}^n_+ := \{ X \in \mathbb{S}^n : X \succeq 0 \}$.
The dual cone of $\mathbb{S}^n_+$ with respect to $\langle \cdot,\cdot \rangle$ is itself:
$$\left(\mathbb{S}^n_+\right)^* = \{ S \in \mathbb{S}^n : \langle S, X \rangle \ge 0 \ \forall X \succeq 0 \} = \mathbb{S}^n_+.$$This self-duality makes SDP duality particularly elegant and computationally tractable.
Keep this reference in mind throughout the derivation.
$X \in \mathbb{S}^n$ — primal matrix variable (symmetric).
$\mathcal{A}: \mathbb{S}^n \to \mathbb{R}^m$ with $\mathcal{A}(X) = \big(\langle A_1, X \rangle, \ldots, \langle A_m, X \rangle\big)$.
$\mathcal{A}^*: \mathbb{R}^m \to \mathbb{S}^n$ satisfies $\langle \mathcal{A}(X), y \rangle = \langle X, \mathcal{A}^*(y) \rangle$.
$y \in \mathbb{R}^m$ for affine constraints, $S \in \mathbb{S}^n_+$ for conic constraint.
Reminder: The adjoint exists and is unique because $\mathcal{A}$ is linear and the trace inner product is non-degenerate.
Given $\mathcal{A}(X) = (\langle A_i, X \rangle)_{i=1}^m$ with $A_i \in \mathbb{S}^n$:
$$\mathcal{A}^*(y) = \sum_{i=1}^m y_i A_i.$$We will use $\mathcal{A}^*$ heavily when enforcing stationarity.
We already have $(P)$ in the required template: minimize a linear functional over the intersection of an affine subspace and $\mathbb{S}^n_+$.
Heuristic: Equality constraints use unrestricted multipliers; cone constraints use elements of the dual cone.
Sign convention: constraints of the form $g(X) = 0$ contribute $+y^\top g(X)$; cone constraints $X \succeq 0$ contribute $-\langle S,X \rangle$.
Use the adjoint to collect terms in $X$:
$$\mathcal{L}(X, y, S) = \langle C + \mathcal{A}^*(y) - S, X \rangle - \langle b, y \rangle.$$The dependence on $X$ is entirely through the coefficient $C + \mathcal{A}^*(y) - S$.
Eliminating $S$ gives the compact form $\max_y \{ \langle b, y \rangle : C - \mathcal{A}^*(y) \succeq 0 \}$.
For any primal feasible $X$ and dual feasible $(y, S)$,
$$\langle C, X \rangle \ge \langle b, y \rangle.$$Consequently, $\mathrm{OPT}_{(D)} \le \mathrm{OPT}_{(P)}$.
Start from feasibility: $\mathcal{A}(X) = b$ and $\mathcal{A}^*(y) + S = C$.
$$\begin{aligned} \langle C, X \rangle - \langle b, y \rangle &= \langle \mathcal{A}^*(y) + S, X \rangle - \langle b, y \rangle \\ &= \langle y, \mathcal{A}(X) \rangle - \langle b, y \rangle + \langle S, X \rangle \\ &= \langle S, X \rangle \ge 0, \end{aligned}$$ because $S, X \succeq 0 \Rightarrow \langle S, X \rangle \ge 0$.Interpretation: The dual objective always furnishes a lower bound on the primal value, a cornerstone of dual-based algorithms.
If either of the following holds:
then $\mathrm{OPT}_{(P)} = \mathrm{OPT}_{(D)}$ and both optima are attained.
At optimality, $(X^\star, y^\star, S^\star)$ satisfies $\langle S^\star, X^\star \rangle = 0$, equivalently $S^\star X^\star = 0$ (their ranges are orthogonal).
Takeaway: Strict feasibility is a constraint qualification guaranteeing no duality gap.
$A_0 \succ 0 \Rightarrow$ primal Slater condition holds.
Constraint matrix: $\begin{pmatrix}1 - x_2 & x_1 \\ x_1 & 1 + x_2\end{pmatrix} \succeq 0$.
$x^\star = (0,0)$ yields PSD matrix $A_0$ and objective value $0$.
Exercise: Derive the dual and verify matching optimal value.
Largest singular value of $M$.
Dual to nuclear norm.
Sum of singular values.
Convex surrogate for rank.
Both admit compact SDP representations derived via duality.
Let $T: H_1 \to H_2$ be bounded between Hilbert spaces with singular values $s_1(T) \ge s_2(T) \ge \cdots \ge 0$.
Finite-dimensional case: $\|T\|_p = \big(\mathrm{Tr}(|T|^p)\big)^{1/p}$ where $|T| := \sqrt{T^\ast T}$.
For $p, q, r \in [1,\infty]$ with $\tfrac{1}{p} + \tfrac{1}{q} = \tfrac{1}{r}$:
These support functions lead directly to SDP formulations.
For symmetric $M \in \mathbb{S}^n$:
$$\|M\|_\infty = \max_{\|x\|_2 = 1} |x^\top M x| = \max\{|\lambda_{\max}(M)|, |\lambda_{\min}(M)|\}.$$Introduce scalar $t$ and block matrix $X(t) = \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix}$.
$$X(t) \succeq 0 \iff t I_n \succeq \pm M \iff t \ge \|M\|_\infty.$$Primal SDP:
$$(\mathrm{P}_\infty)\quad \min_{t \in \mathbb{R}} \ t \quad \text{s.t.} \quad \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix} \succeq 0.$$Schur complement captures the requirement that all eigenvalues of $M$ lie in $[-t, t]$.
Dual variable $Z = \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0$.
$$\mathcal{L}(t, Z) = t - \langle Z, X(t) \rangle.$$Hence $(\mathrm{D}_\infty)$:
$$\begin{aligned} \text{maximize}\quad & \langle M, Y \rangle \\ \text{subject to}\quad & \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0, \\ & \mathrm{Tr}A + \mathrm{Tr}B = 1. \end{aligned}$$For $M \in \mathbb{R}^{p \times q}$ with singular values $s_1 \ge \dots \ge s_{\min(p,q)}$:
$$\|M\|_1 = \sum_{k=1}^{\min(p, q)} s_k(M).$$Introduce $U \in \mathbb{S}^p_+$ and $V \in \mathbb{S}^q_+$:
$$X = \begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0.$$ $$ (\mathrm{P}_1)\quad \min_{U,V} \frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V) \quad \text{s.t.} \quad X \succeq 0.$$The Schur complement condition ensures $U^{1/2}$ and $V^{1/2}$ “sandwich” $M$ with singular values at most $1$; minimizing the trace recovers $\|M\|_1$.
SDP: minimize $t$ s.t. $\begin{pmatrix} tI & M \\ M & tI \end{pmatrix} \succeq 0$.
SDP: minimize $\frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V)$ s.t. $\begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0$.