AIAA 3225 · Learning and Optimization for Artificial Intelligence

Week 5 · Part 1

Topic: From Conic Programming to Semidefinite Programming

Instructor: Prof. Xin Wang

Date: 9 October 2025

Session Goals

Understand the geometry of the positive semidefinite (PSD) cone.
Trace the emergence of SDPs from LMIs and conic programming.
State and interpret standard SDP formulations.
Recognize how SDP relaxations model matrix processing problems for the AI era.

Why Study SDPs?

Everyday Optimization Needs

SDPs appear whenever we manipulate covariance matrices, similarity matrices, or Gram matrices while insisting they remain positive semidefinite.

Concrete Scenarios

Safer AI systems: Certify that control policies keep robots balanced by searching for PSD Lyapunov functions.
Fair data compression: Produce low-rank embeddings that preserve pairwise distances without creating negative variances.
Robust recommendations: Tune correlation matrices in recommender systems so uncertainty estimates stay valid.
Timetables & logistics: Relax hard combinatorial schedules into SDPs, get high-quality approximations fast, then round back to feasible plans.

In each case the SDP constraint “matrix ⪰ 0” captures “all quadratic combinations behave nicely,” turning messy nonlinear requirements into convex optimization problems.

Development of SDP (Part I)

1980s — Linear Matrix Inequalities (LMIs)

Control theory (Lyapunov stability, $\mathcal{H}_{\infty}$ design) motivates solving LMIs efficiently.

Early 1990s — Conic Programming

Optimization unified as

\min_{x \in \mathbb{R}^n} \left\{ \langle c, x \rangle : Ax = b,\; x \in K \right\},

with $K$ a closed convex cone (LP, SOCP, SDP all fit inside).

1994 — Nesterov & Nemirovskii

Introduce self-concordant barriers for $\mathbb{S}_+^n$, establishing polynomial-time interior-point methods for SDPs.

Development of SDP (Part II)

1995 — Goemans & Williamson

MAX-CUT approximation ratio $0.878$ via SDP relaxation and random hyperplane rounding — seminal result in theoretical CS.

Late 1990s–2000s — Algorithms & Software

Solvers: SeDuMi, SDPT3, MOSEK, CVX.
Applications: signal processing, structural engineering, robust finance.

2010s–Present — Scalability

First-order and low-rank methods (Burer–Monteiro factorization, ADMM, randomized sketches) tackle large SDPs.

The Cone $\mathbb{S}_+^{n}$

Definitions

$\mathbb{S}^n$: real symmetric $n \times n$ matrices. Spectral theorem ensures orthogonal diagonalization with eigenvalues $\lambda_1(A), \dots, \lambda_n(A)$.

\mathbb{S}_+^{n} = \{ A \in \mathbb{S}^n : \lambda_i(A) \ge 0 \ \forall i \}, \quad \mathbb{S}_{++}^{n} = \{ A \in \mathbb{S}^n : \lambda_i(A) > 0 \ \forall i \}.

Loewner Order

A \succeq B \iff A - B \in \mathbb{S}_+^{n},\qquad A \succeq 0 \iff A \in \mathbb{S}_+^{n},\qquad A \succ 0 \iff A \in \mathbb{S}_{++}^{n}.

Geometry

Convex, pointed, with interior $\mathbb{S}_{++}^n$. Boundary consists of PSD matrices with zero eigenvalues.

Quadratic Forms

$A \succeq 0$ iff $x^\top A x \ge 0$ for all $x$ — the defining property used in applications.

Self-Duality

Under $\langle X, Y \rangle = \operatorname{tr}(XY)$, we have $(\mathbb{S}_+^n)^\star = \mathbb{S}_+^n$.

Trace Inner Product on $\mathbb{S}^n$

Definition

\langle A, B \rangle := \operatorname{tr}(AB) = \sum_{i,j=1}^{n} A_{ij} B_{ij}, \qquad A, B \in \mathbb{S}^n.

Induces the Frobenius norm $\|A\|_F = \sqrt{\langle A, A \rangle}$.

Diagonal Matrices

If $A = \operatorname{diag}(a_1, \ldots, a_n)$ and $B = \operatorname{diag}(b_1, \ldots, b_n)$ then

\langle A, B \rangle = \sum_{i=1}^{n} a_i b_i,\qquad \|A\|_F = \sqrt{\sum_{i=1}^{n} a_i^2}.

Thus the trace pairing generalizes the Euclidean dot product.

Equivalent Tests for PSD Matrices

Theorem

For $A \in \mathbb{S}^n$, the following are equivalent:

$A \in \mathbb{S}_+^n$.
$\lambda_i(A) \ge 0$ for $i = 1, \ldots, n$.
$x^\top A x \ge 0$ for every $x \in \mathbb{R}^n$.
$A = B^\top B$ for some $B \in \mathbb{R}^{n \times n}$ (Cholesky factorization).
All principal minors satisfy $\det A[S,S] \ge 0$ (Sylvester criterion).

The same statements with strict inequalities characterize $\mathbb{S}_{++}^n$.

Proof Sketch of PSD Equivalences

Idea Flow

(1) ⇔ (2): PSD defined via non-negative eigenvalues.
(2) ⇒ (3): Spectral decomposition $A = U \Lambda U^\top$ yields $x^\top A x = \sum_i \lambda_i (u_i^\top x)^2 \ge 0$.
(3) ⇒ (2): If $\lambda_k < 0$ with eigenvector $u_k$, then $u_k^\top A u_k = \lambda_k < 0$ — contradiction.
(2) ⇒ (4): Let $B = \Lambda^{1/2} U^\top$; then $A = B^\top B$.
(4) ⇒ (3): $x^\top A x = \|Bx\|_2^2 \ge 0$.
(2) ⇔ (5): Principal minors equal products of eigenvalues of principal submatrices (Sylvester criterion + interlacing).

Each viewpoint (spectral, algebraic, quadratic) offers different modeling advantages.

Structural Properties of $\mathbb{S}_+^n$

Geometry

$\mathbb{S}_+^n$ is a closed, convex, pointed cone. Its interior is $\mathbb{S}_{++}^{n}$ (strictly positive definite matrices).

Self-Duality

With respect to the trace inner product,

(\mathbb{S}_+^n)^\star = \{ B \in \mathbb{S}^n : \langle A, B \rangle \ge 0 \ \forall A \succeq 0 \} = \mathbb{S}_+^n.

Consequences

Primal–Dual Symmetry: Enables clean complementary slackness $X S = 0$ with $X, S \succeq 0$.
Algorithm Design: Primal–dual interior-point methods treat primal and dual symmetrically.
Duality Theory: Facilitates strong duality under mild constraint qualifications.

Proper Cones and Duality

Definition

A cone $K \subseteq \mathbb{R}^n$ is proper if it is closed, convex, pointed ($K \cap (-K) = \{0\}$), and has nonempty interior.

Examples

Nonnegative orthant $\mathbb{R}_+^n$.
Lorentz (second-order) cone $\{(t, x) : t \ge \|x\|_2\}$.
Positive semidefinite cone $\mathbb{S}_+^n$.

Dual Cones

The dual $K^\star = \{ y : \langle y, x \rangle \ge 0 \ \forall x \in K \}$ of a proper cone is also proper. In particular, $(\mathbb{S}_+^n)^\star = \mathbb{S}_+^n$.

Conic Programming Framework

Primal Conic Program (CP)

\begin{aligned} \text{minimize} \quad & \langle c, x \rangle \\ \text{subject to} \quad & Ax = b, \\ & x \in K, \end{aligned} \qquad x \in \mathbb{R}^n,

Here $K$ is a proper cone, $A \in \mathbb{R}^{m \times n}$, $b \in \mathbb{R}^m$, $c \in \mathbb{R}^n$.

Feasible Set Geometry

$\{x : Ax = b\}$ is an affine subspace, so the feasible region $K \cap \{x : Ax = b\}$ is closed and convex.

Unifying View

LP: $K = \mathbb{R}_+^n$ (coordinate-wise nonnegativity).
SOCP: $K$ is a product of Lorentz cones.
SDP: $K = \mathbb{S}_+^n$ with matrix variables.

Linear Programming via Conic Programming

Standard Linear Program

\begin{aligned} \text{minimize} \quad & \langle c, x \rangle \\ \text{subject to} \quad & Ax = b, \\ & x \ge 0, \end{aligned} \qquad x \in \mathbb{R}^n.

Why LP Matters

First convex class with polynomial-time algorithms (ellipsoid, interior-point).
Many practical problems reduce to LP or admit LP relaxations.
Embedding trick: $x \ge 0 \iff \operatorname{diag}(x) \succeq 0$, so LP feasible sets sit inside $\mathbb{S}_+^n$.

SDP generalizes LP by allowing full PSD matrices instead of diagonal matrices only.

Standard-Form Semidefinite Program

Primal SDP

\begin{aligned} \text{minimize} \quad & \langle C, X \rangle \\ \text{subject to} \quad & \mathcal{A}(X) = b, \\ & X \succeq 0, \end{aligned} \qquad X \in \mathbb{S}^n,

$\mathcal{A}: \mathbb{S}^n \rightarrow \mathbb{R}^m$ is linear, $b \in \mathbb{R}^m$, $C \in \mathbb{S}^n$.

Structure

$X$ has $\frac{n(n+1)}{2}$ unique entries.
Objective is linear via trace pairing $\langle C, X \rangle = \operatorname{tr}(CX)$.
Constraints: affine equations plus PSD constraint.
Feasible set: spectrahedron (intersection of $\mathbb{S}_+^n$ with an affine subspace).

Computational Aspects

Interior-point methods give high accuracy but scale poorly with $n$.
Large-scale SDPs use low-rank factorizations or first-order methods.

Example: Correlation-Type SDP

Problem

\begin{aligned} \text{minimize} \quad & X_{12} + X_{13} \\ \text{subject to} \quad & X \succeq 0, \\ & \operatorname{diag}(X) = (1, 1, 1), \\ \end{aligned} \qquad X \in \mathbb{S}^3.

Seeks a PSD matrix with unit diagonal (a correlation matrix) minimizing selected off-diagonal entries.

Standard Form

C = \frac{1}{2} \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \end{pmatrix},\quad \mathcal{A}(X) = \begin{pmatrix} X_{11} \\ X_{22} \\ X_{33} \end{pmatrix},\quad b = \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix}.

Because $X$ is symmetric, $\langle C, X \rangle = X_{12} + X_{13}$.

Applications

Correlation matrix estimation under structural constraints.
Relaxations of combinatorial problems (e.g., MAX-CUT, community detection).
Control design with prescribed diagonal elements (Lyapunov approaches).

Linear Matrix Inequality (LMI) Form

LMI Representation

\begin{aligned} \text{minimize} \quad & c^\top x \\ \text{subject to} \quad & A_0 + x_1 A_1 + \cdots + x_k A_k \succeq 0, \end{aligned} \qquad x \in \mathbb{R}^k,

$A_0, A_1, \ldots, A_k \in \mathbb{S}^n$, $c \in \mathbb{R}^k$.

Why LMIs?

Compactness: Single PSD constraint encodes many inequalities.
Natural Modeling: Control and robust optimization problems often arise as LMIs.
Equivalence: LMI form ⇔ standard SDP form via introducing slack variables or collapsing constraints.
Solver Compatibility: Many SDP solvers accept LMI input directly.

Converting Between Forms

LMI → Standard: Introduce $X = A_0 + \sum x_i A_i$ and enforce $\mathcal{A}(X) = b$.
Standard → LMI: Encode equality constraints via block LMIs.

Example: Quadratic Constraint as LMI

Optimization Problem

\begin{aligned} \text{minimize} \quad & x_1 + 2x_2 \\ \text{subject to} \quad & \begin{pmatrix} 1 - x_2 & x_1 \\ x_1 & 1 + x_2 \end{pmatrix} \succeq 0, \end{aligned} \qquad (x_1, x_2) \in \mathbb{R}^2.

LMI Decomposition

Let

c = \begin{pmatrix} 1 \\ 2 \end{pmatrix},\quad A_0 = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix},\quad A_1 = \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix},\quad A_2 = \begin{pmatrix} -1 & 0 \\ 0 & 1 \end{pmatrix}.

Then $A_0 + x_1 A_1 + x_2 A_2$ equals the PSD constraint matrix.

Geometric View

Feasible points $(x_1, x_2)$ form a convex set in $\mathbb{R}^2$ defined by non-negativity of a symmetric $2 \times 2$ matrix.

Spectrahedra

Definition

A spectrahedron is a set of the form

\mathcal{C} = \{ x \in \mathbb{R}^k : A_0 + x_1 A_1 + \cdots + x_k A_k \succeq 0 \}.

Properties

Always convex: intersection of $\mathbb{S}_+^n$ with an affine subspace.
Generalizes polyhedra: diagonal $A_i$ yield polyhedral sets.
Not necessarily polyhedral: can have curved boundaries.
Feasible region of any SDP is a spectrahedron.

Polyhedra ⊂ Spectrahedra ⊂ Convex Sets.

Spectrahedra capture rich convex geometry while remaining computationally tractable.

Example: Unit Disk as a Spectrahedron

LMI Representation

D = \{ (x, y) \in \mathbb{R}^2 : x^2 + y^2 \le 1 \} = \left\{ (x, y) : \begin{pmatrix} 1 - y & x \\ x & 1 + y \end{pmatrix} \succeq 0 \right\}.

Verification

Diagonal entries require $1 \pm y \ge 0$.
Determinant condition: $(1 - y)(1 + y) - x^2 = 1 - y^2 - x^2 \ge 0 \iff x^2 + y^2 \le 1$.

Diagonal constraints are redundant once $x^2 + y^2 \le 1$ holds.

Takeaway

Even strictly convex sets (no flat facets) can admit LMI representations — spectrahedra go beyond polyhedra.

Lift-and-Relax Strategy (Step 1)

Target Constraint

We wish to represent

x^4 + y^4 \le 1

as a family of LMIs. The feasible set is non-polyhedral but convex.

Quadratic Lifting

Introduce auxiliary variables $u, v \ge 0$ and observe the equivalence:

x^4 + y^4 \le 1 \iff \exists (u, v) \text{ such that } \begin{cases} x^2 \le u, \\ y^2 \le v, \\ u^2 + v^2 \le 1. \end{cases}

Forward: Let $u = x^2$, $v = y^2$.
Backward: If the system holds, then $x^4 + y^4 = (x^2)^2 + (y^2)^2 \le u^2 + v^2 \le 1$.
Goal: Replace quartic constraint with quadratic ones in a higher-dimensional space.

Lift-and-Relax Strategy (Step 2)

Quadratic Constraints as LMIs

1. Constraint $x^2 \le u$

x^2 \le u \iff \begin{pmatrix} u & x \\ x & 1 \end{pmatrix} \succeq 0.

PSD of a $2 \times 2$ matrix ⇔ $u \ge 0$ and $u - x^2 \ge 0$.

2. Constraint $y^2 \le v$

y^2 \le v \iff \begin{pmatrix} v & y \\ y & 1 \end{pmatrix} \succeq 0.

3. Constraint $u^2 + v^2 \le 1$

u^2 + v^2 \le 1 \iff \begin{pmatrix} 1 - v & u \\ u & 1 + v \end{pmatrix} \succeq 0.

Determinant condition: $(1 - v)(1 + v) - u^2 = 1 - v^2 - u^2 \ge 0$.

Resulting SDP Formulation

Lifted Semidefinite Program

\begin{aligned} \text{minimize} \quad & a x + b y \\ \text{subject to} \quad & \begin{pmatrix} u & x \\ x & 1 \end{pmatrix} \succeq 0, \\ & \begin{pmatrix} v & y \\ y & 1 \end{pmatrix} \succeq 0, \\ & \begin{pmatrix} 1 - v & u \\ u & 1 + v \end{pmatrix} \succeq 0, \end{aligned} \qquad (x, y, u, v) \in \mathbb{R}^4.

Key Points

Linear Objective: Original cost is unchanged (here $a, b$ are given).
PSD Constraints Only: Nonlinearities encoded via LMIs.
Exact Reformulation: Same feasible $(x, y)$ pairs as $x^4 + y^4 \le 1$.

Why It Matters

Demonstrates how higher-order polynomial constraints become SDPs via lifting.
Motivates sum-of-squares programming and semidefinite relaxations in polynomial optimization.
Shows the expressive power of spectrahedra beyond polyhedral geometry.

Next Part: Duality theory of SDPs — deriving the dual problem, complementary slackness, and interpreting dual certificates.