Slide 1 of 16 Overview

AIAA 3225 — Learning and Optimization for Artificial Intelligence

Week 5 · Part 2

Today's Focus

Duality Theory of Semidefinite Programming and Its Applications

Instructor
Prof. Xin Wang
Date
2025 · 10 · 09

Session Outcomes

  • Understand the geometry of the positive semidefinite cone.
  • Derive SDP duals systematically via the Lagrangian framework.
  • Apply duality insights to compute spectral and nuclear norms.
Semidefinite Programs Dual Cones Matrix Norms AI Applications

Mathematical Foundations

Ambient Space and Inner Product

Symmetric matrices: $\mathbb{S}^n = \{ X \in \mathbb{R}^{n \times n} : X = X^\top \}$.

Trace inner product: $\langle A, B \rangle := \mathrm{Tr}(AB)$ for $A,B \in \mathbb{S}^n$.

Positive semidefinite cone: $\mathbb{S}^n_+ := \{ X \in \mathbb{S}^n : X \succeq 0 \}$.

Key Insight — Self-Dual Cone

The dual cone of $\mathbb{S}^n_+$ with respect to $\langle \cdot,\cdot \rangle$ is itself:

$$\left(\mathbb{S}^n_+\right)^* = \{ S \in \mathbb{S}^n : \langle S, X \rangle \ge 0 \ \forall X \succeq 0 \} = \mathbb{S}^n_+.$$

This self-duality makes SDP duality particularly elegant and computationally tractable.

Why the Trace Inner Product?

Notation at a Glance

Keep this reference in mind throughout the derivation.

Decision Variable

$X \in \mathbb{S}^n$ — primal matrix variable (symmetric).

Linear Map

$\mathcal{A}: \mathbb{S}^n \to \mathbb{R}^m$ with $\mathcal{A}(X) = \big(\langle A_1, X \rangle, \ldots, \langle A_m, X \rangle\big)$.

Adjoint

$\mathcal{A}^*: \mathbb{R}^m \to \mathbb{S}^n$ satisfies $\langle \mathcal{A}(X), y \rangle = \langle X, \mathcal{A}^*(y) \rangle$.

Dual Variables

$y \in \mathbb{R}^m$ for affine constraints, $S \in \mathbb{S}^n_+$ for conic constraint.

Reminder: The adjoint exists and is unique because $\mathcal{A}$ is linear and the trace inner product is non-degenerate.

Primal SDP: Standard Conic Form

Primal Problem (P)

$$\begin{aligned} \text{minimize} & \quad \langle C, X \rangle \\ \text{subject to} & \quad \mathcal{A}(X) = b \\ & \quad X \succeq 0, \end{aligned}$$ where $C \in \mathbb{S}^n$ and $b \in \mathbb{R}^m$ are given data.

Affirming Standard Form

  • Single linear map $\mathcal{A}$ aggregates all equality constraints.
  • The only cone constraint is $X \in \mathbb{S}^n_+$.
  • No additional inequalities — ideal for Lagrangian duality.

Adjoint Map

Given $\mathcal{A}(X) = (\langle A_i, X \rangle)_{i=1}^m$ with $A_i \in \mathbb{S}^n$:

$$\mathcal{A}^*(y) = \sum_{i=1}^m y_i A_i.$$

We will use $\mathcal{A}^*$ heavily when enforcing stationarity.

Deriving the Dual: Strategy

Step 1 — Verify Conic Standard Form

We already have $(P)$ in the required template: minimize a linear functional over the intersection of an affine subspace and $\mathbb{S}^n_+$.

Step 2 — Introduce Dual Variables

  • Lagrange multipliers for equality: $y \in \mathbb{R}^m$ (free).
  • Multiplier for cone constraint: $S \in \mathbb{S}^n_+$ (same cone due to self-duality).

Heuristic: Equality constraints use unrestricted multipliers; cone constraints use elements of the dual cone.

Deriving the Dual: Lagrangian

Step 3 — Form the Lagrangian

$$\mathcal{L}(X, y, S) = \langle C, X \rangle + \langle y, \mathcal{A}(X) - b \rangle - \langle S, X \rangle.$$

Sign convention: constraints of the form $g(X) = 0$ contribute $+y^\top g(X)$; cone constraints $X \succeq 0$ contribute $-\langle S,X \rangle$.

Step 4 — Simplify the Lagrangian

Use the adjoint to collect terms in $X$:

$$\mathcal{L}(X, y, S) = \langle C + \mathcal{A}^*(y) - S, X \rangle - \langle b, y \rangle.$$

The dependence on $X$ is entirely through the coefficient $C + \mathcal{A}^*(y) - S$.

Dual Feasibility and Objective

Step 5 — Compute the Dual Function

$$g(y,S) = \inf_{X \succeq 0} \mathcal{L}(X, y, S) = \begin{cases} -\langle b, y \rangle, & \text{if } C + \mathcal{A}^*(y) - S = 0, \\ -\infty, & \text{otherwise}. \end{cases}$$

Step 6 — Assemble the Dual Program

Dual Problem (D)

$$\begin{aligned} \text{maximize}\quad & \langle b, y \rangle \\ \text{subject to}\quad & \mathcal{A}^*(y) + S = C \\ & S \succeq 0. \end{aligned}$$

Eliminating $S$ gives the compact form $\max_y \{ \langle b, y \rangle : C - \mathcal{A}^*(y) \succeq 0 \}$.

Weak Duality

Theorem (Weak Duality)

For any primal feasible $X$ and dual feasible $(y, S)$,

$$\langle C, X \rangle \ge \langle b, y \rangle.$$

Consequently, $\mathrm{OPT}_{(D)} \le \mathrm{OPT}_{(P)}$.

Proof Sketch

Start from feasibility: $\mathcal{A}(X) = b$ and $\mathcal{A}^*(y) + S = C$.

$$\begin{aligned} \langle C, X \rangle - \langle b, y \rangle &= \langle \mathcal{A}^*(y) + S, X \rangle - \langle b, y \rangle \\ &= \langle y, \mathcal{A}(X) \rangle - \langle b, y \rangle + \langle S, X \rangle \\ &= \langle S, X \rangle \ge 0, \end{aligned}$$ because $S, X \succeq 0 \Rightarrow \langle S, X \rangle \ge 0$.

Interpretation: The dual objective always furnishes a lower bound on the primal value, a cornerstone of dual-based algorithms.

Strong Duality (Slater's Condition)

Theorem (Slater)

If either of the following holds:

  1. There exists $\bar{X} \succ 0$ with $\mathcal{A}(\bar{X}) = b$ (strict primal feasibility), or
  2. There exists $(\bar{y}, \bar{S})$ with $\bar{S} \succ 0$ and $\mathcal{A}^*((\bar{y})) + \bar{S} = C$ (strict dual feasibility),

then $\mathrm{OPT}_{(P)} = \mathrm{OPT}_{(D)}$ and both optima are attained.

Complementary Slackness

At optimality, $(X^\star, y^\star, S^\star)$ satisfies $\langle S^\star, X^\star \rangle = 0$, equivalently $S^\star X^\star = 0$ (their ranges are orthogonal).

Takeaway: Strict feasibility is a constraint qualification guaranteeing no duality gap.

Worked Example: 2×2 Linear Matrix Inequality

Problem

$$(\mathrm{P}_1)\quad \min_{x \in \mathbb{R}^2} \begin{pmatrix}1 & 2\end{pmatrix} x \quad \text{s.t.} \quad A_0 + x_1 A_1 + x_2 A_2 \succeq 0,$$ where $$A_0 = \begin{pmatrix}1 & 0\\0 & 1\end{pmatrix},\; A_1 = \begin{pmatrix}0 & 1\\1 & 0\end{pmatrix},\; A_2 = \begin{pmatrix}-1 & 0\\0 & 1\end{pmatrix}.$$

Feasibility Check

$A_0 \succ 0 \Rightarrow$ primal Slater condition holds.

Constraint matrix: $\begin{pmatrix}1 - x_2 & x_1 \\ x_1 & 1 + x_2\end{pmatrix} \succeq 0$.

Optimal Solution

$x^\star = (0,0)$ yields PSD matrix $A_0$ and objective value $0$.

Exercise: Derive the dual and verify matching optimal value.

Applications: Matrix Norms via SDP

Why Matrix Norms Matter

  • Machine Learning: Regularization, low-rank recovery.
  • Control Theory: Robust stability via $H_\infty$ norms.
  • Signal Processing: Filter design, denoising.
  • Data Science: Matrix completion (Netflix Prize), PCA variants.

Duality Links Key Norm Pairs

Spectral Norm $\|M\|_\infty$

Largest singular value of $M$.

Dual to nuclear norm.

Nuclear Norm $\|M\|_1$

Sum of singular values.

Convex surrogate for rank.

Both admit compact SDP representations derived via duality.

Schatten Norms: Unified Perspective

Setup

Let $T: H_1 \to H_2$ be bounded between Hilbert spaces with singular values $s_1(T) \ge s_2(T) \ge \cdots \ge 0$.

Definition (Schatten $p$-Norm)

$$\|T\|_p := \begin{cases} \left( \sum_{k=1}^{\infty} s_k(T)^p \right)^{1/p}, & 1 \le p < \infty, \\ s_1(T), & p = \infty. \end{cases}$$

Finite-dimensional case: $\|T\|_p = \big(\mathrm{Tr}(|T|^p)\big)^{1/p}$ where $|T| := \sqrt{T^\ast T}$.

Special Cases

  • $p=1$: Nuclear (trace) norm $\sum s_k(T)$.
  • $p=2$: Frobenius norm $\sqrt{\sum s_k(T)^2}$.
  • $p=\infty$: Spectral norm $s_1(T)$.

Schatten Norm Properties

Fundamental Properties

For $p, q, r \in [1,\infty]$ with $\tfrac{1}{p} + \tfrac{1}{q} = \tfrac{1}{r}$:

  • Unitary invariance: $\|UTV\|_p = \|T\|_p$ for all unitary $U, V$.
  • Hölder inequality: $\|ST\|_r \le \|S\|_p \|T\|_q$.
  • Duality: $(\|\cdot\|_p)^* = \|\cdot\|_q$ with $\tfrac{1}{p} + \tfrac{1}{q} = 1$.

Dual Norm Characterizations

$$\|M\|_\infty = \max_{\|Y\|_1 \le 1} \langle M, Y \rangle,\qquad \|M\|_1 = \max_{\|Y\|_\infty \le 1} \langle M, Y \rangle.$$

These support functions lead directly to SDP formulations.

Spectral Norm via SDP

Variational Form

For symmetric $M \in \mathbb{S}^n$:

$$\|M\|_\infty = \max_{\|x\|_2 = 1} |x^\top M x| = \max\{|\lambda_{\max}(M)|, |\lambda_{\min}(M)|\}.$$

SDP Formulation

Introduce scalar $t$ and block matrix $X(t) = \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix}$.

$$X(t) \succeq 0 \iff t I_n \succeq \pm M \iff t \ge \|M\|_\infty.$$

Primal SDP:

$$(\mathrm{P}_\infty)\quad \min_{t \in \mathbb{R}} \ t \quad \text{s.t.} \quad \begin{pmatrix} tI_n & M \\ M & tI_n \end{pmatrix} \succeq 0.$$

Schur complement captures the requirement that all eigenvalues of $M$ lie in $[-t, t]$.

Spectral Norm Dual Derivation

Lagrangian Setup

Dual variable $Z = \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0$.

$$\mathcal{L}(t, Z) = t - \langle Z, X(t) \rangle.$$

Expanding the Inner Product

$$\langle Z, X(t) \rangle = t(\mathrm{Tr}A + \mathrm{Tr}B) + 2\langle Y, M \rangle.$$ $$\Rightarrow \mathcal{L}(t, Z) = t(1 - \mathrm{Tr}A - \mathrm{Tr}B) - 2\langle Y, M \rangle.$$

Dual Problem

$$g(Z) = \inf_{t \in \mathbb{R}} \mathcal{L}(t, Z) = \begin{cases} -2 \langle Y, M \rangle, & \mathrm{Tr}A + \mathrm{Tr}B = 1, \\ -\infty, & \text{otherwise}. \end{cases}$$

Hence $(\mathrm{D}_\infty)$:

$$\begin{aligned} \text{maximize}\quad & \langle M, Y \rangle \\ \text{subject to}\quad & \begin{pmatrix} A & Y \\ Y^\top & B \end{pmatrix} \succeq 0, \\ & \mathrm{Tr}A + \mathrm{Tr}B = 1. \end{aligned}$$

Nuclear Norm via SDP

Nuclear Norm

For $M \in \mathbb{R}^{p \times q}$ with singular values $s_1 \ge \dots \ge s_{\min(p,q)}$:

$$\|M\|_1 = \sum_{k=1}^{\min(p, q)} s_k(M).$$

SDP Formulation

Introduce $U \in \mathbb{S}^p_+$ and $V \in \mathbb{S}^q_+$:

$$X = \begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0.$$ $$ (\mathrm{P}_1)\quad \min_{U,V} \frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V) \quad \text{s.t.} \quad X \succeq 0.$$

The Schur complement condition ensures $U^{1/2}$ and $V^{1/2}$ “sandwich” $M$ with singular values at most $1$; minimizing the trace recovers $\|M\|_1$.

Summary: Duality Picture for Norms

Norm Duality via SDP

Spectral Norm

$$\|M\|_\infty = \max_{\|Y\|_1 \le 1} \langle M, Y \rangle.$$

SDP: minimize $t$ s.t. $\begin{pmatrix} tI & M \\ M & tI \end{pmatrix} \succeq 0$.

Nuclear Norm

$$\|M\|_1 = \max_{\|Y\|_\infty \le 1} \langle M, Y \rangle.$$

SDP: minimize $\frac{1}{2}(\mathrm{Tr}U + \mathrm{Tr}V)$ s.t. $\begin{pmatrix} U & M \\ M^\top & V \end{pmatrix} \succeq 0$.

What to Remember

Next Steps

  • Practice deriving duals for SDPs with inequality constraints.
  • Implement numerical experiments using CVX (MATLAB) or CVXPY (Python).
  • Explore more applications of SDP in computer science and AI.