AIAA 3225 — Learning and Optimization for Artificial Intelligence

Week 5 · Part 2

Today's Focus

Duality Theory of Semidefinite Programming and Its Applications

Instructor
Prof. Xin Wang

Date
2025 · 10 · 09

                Session Outcomes
                Understand the geometry of the positive semidefinite cone.
Derive SDP duals systematically via the Lagrangian framework.
Apply duality insights to compute spectral and nuclear norms.

            

Semidefinite Programs Dual Cones Matrix Norms AI Applications

Mathematical Foundations

Ambient Space and Inner Product

Symmetric matrices: $\mathbb{S}^n = \{ X \in \mathbb{R}^{n \times n} : X = X^\top \}$.

Trace inner product: $\langle A, B \rangle := \mathrm{Tr}(AB)$ for $A,B \in \mathbb{S}^n$.

Positive semidefinite cone: $\mathbb{S}^n_+ := \{ X \in \mathbb{S}^n : X \succeq 0 \}$.

Key Insight — Self-Dual Cone

The dual cone of $\mathbb{S}^n_+$ with respect to $\langle \cdot,\cdot \rangle$ is itself:

$$\left(\mathbb{S}^n_+\right)^* = \{ S \in \mathbb{S}^n : \langle S, X \rangle \ge 0 \ \forall X \succeq 0 \} = \mathbb{S}^n_+.$$

This self-duality makes SDP duality particularly elegant and computationally tractable.

Why the Trace Inner Product?

Invariance under orthogonal similarity transforms: $\langle Q^\top A Q, Q^\top B Q \rangle = \langle A, B \rangle$.
Compatibility with eigenvalue decompositions and spectral theory.
Allows us to interpret constraints and optimality via eigenvalues.

Notation at a Glance

Keep this reference in mind throughout the derivation.

Decision Variable

$X \in \mathbb{S}^n$ — primal matrix variable (symmetric).

Linear Map

$\mathcal{A}: \mathbb{S}^n \to \mathbb{R}^m$ with $\mathcal{A}(X) = \big(\langle A_1, X \rangle, \ldots, \langle A_m, X \rangle\big)$.

Adjoint

$\mathcal{A}^*: \mathbb{R}^m \to \mathbb{S}^n$ satisfies $\langle \mathcal{A}(X), y \rangle = \langle X, \mathcal{A}^*(y) \rangle$.

Dual Variables

$y \in \mathbb{R}^m$ for affine constraints, $S \in \mathbb{S}^n_+$ for conic constraint.

Reminder: The adjoint exists and is unique because $\mathcal{A}$ is linear and the trace inner product is non-degenerate.

Primal SDP: Standard Conic Form

Primal Problem (P)

$$\begin{aligned} \text{minimize} & \quad \langle C, X \rangle \\ \text{subject to} & \quad \mathcal{A}(X) = b \\ & \quad X \succeq 0, \end{aligned}$$ where $C \in \mathbb{S}^n$ and $b \in \mathbb{R}^m$ are given data.

                    Affirming Standard Form
                    Single linear map $\mathcal{A}$ aggregates all equality constraints.
The only cone constraint is $X \in \mathbb{S}^n_+$.
No additional inequalities — ideal for Lagrangian duality.

                

Adjoint Map

Given $\mathcal{A}(X) = (\langle A_i, X \rangle)_{i=1}^m$ with $A_i \in \mathbb{S}^n$:

$$\mathcal{A}^*(y) = \sum_{i=1}^m y_i A_i.$$

We will use $\mathcal{A}^*$ heavily when enforcing stationarity.

Deriving the Dual: Strategy

Step 1 — Verify Conic Standard Form

We already have $(P)$ in the required template: minimize a linear functional over the intersection of an affine subspace and $\mathbb{S}^n_+$.

Step 2 — Introduce Dual Variables

Lagrange multipliers for equality: $y \in \mathbb{R}^m$ (free).
Multiplier for cone constraint: $S \in \mathbb{S}^n_+$ (same cone due to self-duality).

Heuristic: Equality constraints use unrestricted multipliers; cone constraints use elements of the dual cone.

Deriving the Dual: Lagrangian

Step 3 — Form the Lagrangian

$$\mathcal{L}(X, y, S) = \langle C, X \rangle + \langle y, \mathcal{A}(X) - b \rangle - \langle S, X \rangle.$$

Sign convention: constraints of the form $g(X) = 0$ contribute $+y^\top g(X)$; cone constraints $X \succeq 0$ contribute $-\langle S,X \rangle$.

Step 4 — Simplify the Lagrangian

Use the adjoint to collect terms in $X$:

$$\mathcal{L}(X, y, S) = \langle C + \mathcal{A}^*(y) - S, X \rangle - \langle b, y \rangle.$$

The dependence on $X$ is entirely through the coefficient $C + \mathcal{A}^*(y) - S$.

Dual Feasibility and Objective

Step 5 — Compute the Dual Function

$$g(y,S) = \inf_{X \succeq 0} \mathcal{L}(X, y, S) = \begin{cases} -\langle b, y \rangle, & \text{if } C + \mathcal{A}^*(y) - S = 0, \\ -\infty, & \text{otherwise}. \end{cases}$$

Step 6 — Assemble the Dual Program

Dual Problem (D)

$$\begin{aligned} \text{maximize}\quad & \langle b, y \rangle \\ \text{subject to}\quad & \mathcal{A}^*(y) + S = C \\ & S \succeq 0. \end{aligned}$$

Eliminating $S$ gives the compact form $\max_y \{ \langle b, y \rangle : C - \mathcal{A}^*(y) \succeq 0 \}$.

Weak Duality

Theorem (Weak Duality)

For any primal feasible $X$ and dual feasible $(y, S)$,

$$\langle C, X \rangle \ge \langle b, y \rangle.$$

Consequently, $\mathrm{OPT}_{(D)} \le \mathrm{OPT}_{(P)}$.

Proof Sketch

Start from feasibility: $\mathcal{A}(X) = b$ and $\mathcal{A}^*(y) + S = C$.

$$\begin{aligned} \langle C, X \rangle - \langle b, y \rangle &= \langle \mathcal{A}^*(y) + S, X \rangle - \langle b, y \rangle \\ &= \langle y, \mathcal{A}(X) \rangle - \langle b, y \rangle + \langle S, X \rangle \\ &= \langle S, X \rangle \ge 0, \end{aligned}$$ because $S, X \succeq 0 \Rightarrow \langle S, X \rangle \ge 0$.

Interpretation: The dual objective always furnishes a lower bound on the primal value, a cornerstone of dual-based algorithms.

Strong Duality (Slater's Condition)

Theorem (Slater)

If either of the following holds:

There exists $\bar{X} \succ 0$ with $\mathcal{A}(\bar{X}) = b$ (strict primal feasibility), or
There exists $(\bar{y}, \bar{S})$ with $\bar{S} \succ 0$ and $\mathcal{A}^*((\bar{y})) + \bar{S} = C$ (strict dual feasibility),

then $\mathrm{OPT}_{(P)} = \mathrm{OPT}_{(D)}$ and both optima are attained.

Complementary Slackness

At optimality, $(X^\star, y^\star, S^\star)$ satisfies $\langle S^\star, X^\star \rangle = 0$, equivalently $S^\star X^\star = 0$ (their ranges are orthogonal).

Takeaway: Strict feasibility is a constraint qualification guaranteeing no duality gap.

Worked Example: 2×2 Linear Matrix Inequality

Problem

$$(\mathrm{P}_1)\quad \min_{x \in \mathbb{R}^2} \begin{pmatrix}1 & 2\end{pmatrix} x \quad \text{s.t.} \quad A_0 + x_1 A_1 + x_2 A_2 \succeq 0,$$ where $$A_0 = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix},\; A_1 = \begin{pmatrix}0 & 1\\1 & 0\end{pmatrix},\; A_2 = \begin{pmatrix}-1 & 0\\0 & 1\end{pmatrix}.$$

Feasibility Check

$A_0 \succ 0 \Rightarrow$ primal Slater condition holds.

Constraint matrix: $\begin{pmatrix}1 - x_2 & x_1 \\ x_1 & 1 + x_2\end{pmatrix} \succeq 0$.

Optimal Solution

$x^\star = (0,0)$ yields PSD matrix $A_0$ and objective value $0$.

Exercise: Derive the dual and verify matching optimal value.

Applications: Matrix Norms via SDP

Why Matrix Norms Matter

Machine Learning: Regularization, low-rank recovery.
Control Theory: Robust stability via $H_\infty$ norms.
Signal Processing: Filter design, denoising.
Data Science: Matrix completion (Netflix Prize), PCA variants.

Duality Links Key Norm Pairs

Spectral Norm $\|M\|_\infty$

Largest singular value of $M$.

Dual to nuclear norm.

Nuclear Norm $\|M\|_1$

Sum of singular values.

Convex surrogate for rank.

Both admit compact SDP representations derived via duality.

Schatten Norms: Unified Perspective

Setup

Let $T: H_1 \to H_2$ be bounded between Hilbert spaces with singular values $s_1(T) \ge s_2(T) \ge \cdots \ge 0$.

Definition (Schatten $p$-Norm)

$$\|T\|_p := \begin{cases} \left( \sum_{k=1}^{\infty} s_k(T)^p \right)^{1/p}, & 1 \le p < \infty, \\ s_1(T), & p = \infty. \end{cases}$$

Finite-dimensional case: $\|T\|_p = \big(\mathrm{Tr}(|T|^p)\big)^{1/p}$ where $|T| := \sqrt{T^\ast T}$.

                Special Cases
                $p=1$: Nuclear (trace) norm $\sum s_k(T)$.
$p=2$: Frobenius norm $\sqrt{\sum s_k(T)^2}$.
$p=\infty$: Spectral norm $s_1(T)$.

            

Schatten Norm Properties

Fundamental Properties

For $p, q, r \in [1,\infty]$ with $\tfrac{1}{p} + \tfrac{1}{q} = \tfrac{1}{r}$:

Unitary invariance: $\|UTV\|_p = \|T\|_p$ for all unitary $U, V$.
Hölder inequality: $\|ST\|_r \le \|S\|_p \|T\|_q$.
Duality: $(\|\cdot\|_p)^* = \|\cdot\|_q$ with $\tfrac{1}{p} + \tfrac{1}{q} = 1$.

Dual Norm Characterizations

$$\|M\|_\infty = \max_{\|Y\|_1 \le 1} \langle M, Y \rangle,\qquad \|M\|_1 = \max_{\|Y\|_\infty \le 1} \langle M, Y \rangle.$$

These support functions lead directly to SDP formulations.

Spectral Norm via SDP

Variational Form

For symmetric $M \in \mathbb{S}^n$:

$$\|M\|_\infty = \max_{\|x\|_2 = 1} |x^\top M x| = \max\{|\lambda_{\max}(M)|, |\lambda_{\min}(M)|\}.$$

SDP Formulation

Introduce scalar $t$ and block matrix $X(t) = \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix}$.

$$X(t) \succeq 0 \iff t I_n \succeq \pm M \iff t \ge \|M\|_\infty.$$

Primal SDP:

$$(\mathrm{P}_\infty)\quad \min_{t \in \mathbb{R}} \ t \quad \text{s.t.} \quad \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix} \succeq 0.$$

Schur complement captures the requirement that all eigenvalues of $M$ lie in $[-t, t]$.

Spectral Norm Dual Derivation

Lagrangian Setup

Dual variable $Z = \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0$.

$$\mathcal{L}(t, Z) = t - \langle Z, X(t) \rangle.$$

Expanding the Inner Product

$$\langle Z, X(t) \rangle = t(\mathrm{Tr}A + \mathrm{Tr}B) + 2\langle Y, M \rangle.$$ $$\Rightarrow \mathcal{L}(t, Z) = t(1 - \mathrm{Tr}A - \mathrm{Tr}B) - 2\langle Y, M \rangle.$$

Dual Problem

$$g(Z) = \inf_{t \in \mathbb{R}} \mathcal{L}(t, Z) = \begin{cases} -2 \langle Y, M \rangle, & \mathrm{Tr}A + \mathrm{Tr}B = 1, \\ -\infty, & \text{otherwise}. \end{cases}$$

Hence $(\mathrm{D}_\infty)$:

$$\begin{aligned} \text{maximize}\quad & \langle M, Y \rangle \\ \text{subject to}\quad & \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0, \\ & \mathrm{Tr}A + \mathrm{Tr}B = 1. \end{aligned}$$

Nuclear Norm via SDP

Nuclear Norm

For $M \in \mathbb{R}^{p \times q}$ with singular values $s_1 \ge \dots \ge s_{\min(p,q)}$:

$$\|M\|_1 = \sum_{k=1}^{\min(p, q)} s_k(M).$$

SDP Formulation

Introduce $U \in \mathbb{S}^p_+$ and $V \in \mathbb{S}^q_+$:

$$X = \begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0.$$ $$ (\mathrm{P}_1)\quad \min_{U,V} \frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V) \quad \text{s.t.} \quad X \succeq 0.$$

The Schur complement condition ensures $U^{1/2}$ and $V^{1/2}$ “sandwich” $M$ with singular values at most $1$; minimizing the trace recovers $\|M\|_1$.

Summary: Duality Picture for Norms

Norm Duality via SDP

Spectral Norm

$$\|M\|_\infty = \max_{\|Y\|_1 \le 1} \langle M, Y \rangle.$$

SDP: minimize $t$ s.t. $\begin{pmatrix} tI & M \\ M & tI \end{pmatrix} \succeq 0$.

Nuclear Norm

$$\|M\|_1 = \max_{\|Y\|_\infty \le 1} \langle M, Y \rangle.$$

SDP: minimize $\frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V)$ s.t. $\begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0$.

What to Remember

SDP duality follows the same Lagrangian recipe as LP/QP but leverages self-duality of $\mathbb{S}^n_+$.
Complementary slackness interprets optimal solutions via orthogonal eigenspaces.
Applications span many domains where convex relaxations are essential.

Next Steps

Practice deriving duals for SDPs with inequality constraints.
Implement numerical experiments using CVX (MATLAB) or CVXPY (Python).
Explore more applications of SDP in computer science and AI.