Euclidean vs Non-Euclidean Geometry: Understanding Distance in Different Spaces
核心问题 | Key Question: 什么是"距离"?What is "distance"?
在不同的空间中,"距离"和"最短路径"的定义是不同的。理解这一点对于设计优化算法至关重要。
In different spaces, the definitions of "distance" and "shortest path" differ fundamentally. Understanding this is crucial for designing optimization algorithms.
优化中的启示 | Insight for Optimization:
优化算法本质上是在参数空间中寻找"最短路径"到达最优点。选择与空间几何结构相匹配的度量,能够获得更优的收敛性质和算法性能。
Optimization algorithms fundamentally seek the "shortest path" to the optimum in parameter space. Choosing a metric that matches the geometric structure of the space yields better convergence properties and algorithm performance.
Mirror Descent 的动机: 当优化约束域具有非欧几何结构(如概率单纯形 \(\Delta^n = \{x \in \mathbb{R}^n_+ : \sum_i x_i = 1\}\)),使用 Bregman 散度而非欧式距离,算法能够自然地保持约束并获得更好的收敛率。
Motivation for Mirror Descent: When the constraint set has non-Euclidean structure (e.g., probability simplex), using Bregman divergence instead of Euclidean distance allows the algorithm to naturally preserve constraints and achieve better convergence rates.
度量结构 | Metric Structure: \((\mathbb{R}^n, \|\cdot\|_2)\) 配备欧式范数 / Equipped with Euclidean norm
距离公式 | Distance Formula: \(d(x,y) = \|x - y\|_2 = \sqrt{(x_2-x_1)^2 + (y_2-y_1)^2}\)
测地线 | Geodesics: 两点间的直线段(唯一最短路径)/ Straight line segments (unique shortest path)
几何性质 | Geometric Properties:
• 平行公设:通过直线外一点有唯一平行线
Parallel postulate: Unique parallel through external point
• 三角形内角和 = \(\pi\)
Triangle angle sum = \(\pi\)
• 曲率 \(K = 0\)(平坦空间)
Curvature \(K = 0\) (flat space)
度量结构 | Metric Structure: \((S^2, g)\) Riemannian 流形,其中 \(g\) 是球面度量 / Riemannian manifold with spherical metric
距离公式 | Distance Formula (intrinsic):
\(d(x,y) = R\cdot\arccos(\langle x,y\rangle)\) 其中 \(\langle \cdot,\cdot\rangle\) 是内积
where \(\langle \cdot,\cdot\rangle\) is the inner product in \(\mathbb{R}^3\)
测地线 | Geodesics: 大圆弧(球面上的"直线")/ Great circle arcs (the "straight lines" on sphere)
几何性质 | Geometric Properties:
• 不存在平行线:所有大圆最终相交
No parallel lines: All great circles intersect
• 三角形内角和 \(> \pi\)(多余角 = \(K \cdot\)面积)
Triangle angle sum \(> \pi\) (excess = \(K\cdot\)area)
• 正曲率 \(K = 1/R^2\)(弯曲空间)
Positive curvature \(K = 1/R^2\) (curved space)
🛫 直观例子:飞机航线 | Intuitive Example: Flight Routes
从北京(39.9°N, 116.4°E)飞往纽约(40.7°N, -74.0°W):
Flying from Beijing (39.9°N, 116.4°E) to New York (40.7°N, -74.0°W):
• 在平面地图(Mercator投影)上:看似弯曲的路线
On flat map (Mercator projection): Appears as curved route
• 在地球表面(S²)上:沿大圆的测地线,实际最短!
On Earth's surface (S²): Great circle geodesic, actually shortest!
• 关键:内蕴几何(intrinsic geometry)决定最短路径
Key: The intrinsic geometry determines the shortest path
如果飞行员按照平面地图的"直线"飞行,实际上走的是球面上的弦(chord),但弦不在地球表面上,因此不是可行路径。
If pilots fly along the "straight line" on a flat map, they follow a chord through the sphere—but this is not on Earth's surface, hence not a feasible path.
📊 数学类比:优化中的几何结构 | Mathematical Analogy: Geometry in Optimization
问题设定 | Problem Setting: \(\min_{x\in\mathcal{X}} f(x)\),其中 \(\mathcal{X} \subset \mathbb{R}^n\) 是约束集
where \(\mathcal{X} \subset \mathbb{R}^n\) is the constraint set
1. 欧式梯度下降 (Euclidean GD):
\[x_{t+1} = \Pi_{\mathcal{X}}[x_t - \eta\nabla f(x_t)]\]
• 度量:欧式距离 \(D(x,y) = \frac{1}{2}\|x-y\|_2^2\)
Metric: Euclidean distance
• 适用:无约束或简单凸约束(如 \(\ell_2\) 球)
Suitable for: Unconstrained or simple convex constraints
• 需要投影算子 \(\Pi_{\mathcal{X}}\) 保持约束
Requires projection operator \(\Pi_{\mathcal{X}}\) to maintain constraints
2. Mirror Descent (Bregman GD):
\[x_{t+1} = \nabla\psi^*[\nabla\psi(x_t) - \eta\nabla f(x_t)]\]
• 度量:Bregman 散度 \(D_{\psi}(x,y) = \psi(x) - \psi(y) - \langle\nabla\psi(y), x-y\rangle\)
Metric: Bregman divergence
• \(\psi\) 是镜像映射(mirror map),强凸且可微
\(\psi\) is the mirror map, strongly convex and differentiable
• 适用:具有特殊几何结构的约束集
Suitable for: Constraint sets with special geometric structure
🎯 关键例子:概率单纯形 | Key Example: Probability Simplex
约束集: \(\Delta^n = \{x \in \mathbb{R}^n_+ : \sum_i x_i = 1\}\)(所有离散概率分布的空间)
Constraint set: Space of all discrete probability distributions
为什么需要 Mirror Descent?| Why Mirror Descent?
欧式几何的问题:
• 欧式投影 \(\Pi_{\Delta^n}\) 计算复杂(需求解 QP)
Euclidean projection \(\Pi_{\Delta^n}\) is computationally expensive (requires solving QP)
• 不尊重概率分布的自然几何结构
Does not respect the natural geometry of probability distributions
信息几何的优势:
• 选择 \(\psi(x) = \sum_i x_i \log x_i\)(负熵)
Choose \(\psi(x) = \sum_i x_i \log x_i\) (negative entropy)
• Bregman 散度 = KL 散度:\(D_{\text{KL}}(x\|y) = \sum_i x_i \log\frac{x_i}{y_i}\)
Bregman divergence = KL divergence
• 更新规则自动保持非负性和归一化!
Update rule automatically preserves non-negativity and normalization!
\[x_{t+1,i} \propto x_{t,i} \exp\left(-\eta\cdot\frac{\partial f}{\partial x_i}\right)\] 乘性更新 | Multiplicative update
📈 理论优势 | Theoretical Advantages
收敛率 (Convex):
• GD: \(O(1/\sqrt{T})\) 当使用最优步长
• MD: \(O(1/\sqrt{T})\) 但常数因子更优,特别是当初始点距离最优解较远时
MD: \(O(1/\sqrt{T})\) but with better constants, especially when starting far from optimum
适应性 (Adaptivity):
• GD: 对所有维度一视同仁
GD: Treats all dimensions equally
• MD: 根据 \(\psi\) 的 Hessian 自适应地缩放不同维度
MD: Adaptively scales different dimensions via Hessian of \(\psi\)
稀疏性 (Sparsity):
• 负熵镜像映射鼓励稀疏解(许多分量趋向 0)
Negative entropy mirror map encourages sparse solutions
🔗 统一框架 | Unified Framework
Mirror Descent 是一个统一的优化框架:
Mirror Descent provides a unified optimization framework:
• 选择 \(\psi(x) = \frac{1}{2}\|x\|_2^2\) → MD 退化为标准 GD
Choose \(\psi(x) = \frac{1}{2}\|x\|_2^2\) → MD reduces to standard GD
• 选择 \(\psi(x) = \sum_i x_i \log x_i\) → 得到 Exponentiated Gradient
Choose \(\psi(x) = \sum_i x_i \log x_i\) → Obtain Exponentiated Gradient
• 选择 \(\psi(X) = -\log \det(X)\) → 得到正定矩阵空间上的优化
Choose \(\psi(X) = -\log \det(X)\) → Optimization on positive definite matrices
核心思想:让算法的几何与问题的几何相匹配!
Core Idea: Match the algorithm's geometry to the problem's geometry!