Intereting Posts

Tensor products commute with direct limits
What happens if we remove the requirement that $\langle R, + \rangle$ is abelian from the definition of a ring?
Why is $x^{-1} = \frac{1}{x}$?
Integral $ \int_0^{2\pi}{ \sqrt{ 1 – \sin{ \theta } \sin{ 2\theta } + \cos{\theta}\cos{2\theta} } d\theta } $
Does anything precede incrementation in the operator “hierarchy?”
What are some good iPhone/iPod Touch/iPad Apps for mathematicians?
Proof that $\sum\limits_{i=1}^k \log(i)$ belongs to $O(k)$
Union of conjugacy classes of $O(n)$ is not a subgroup
In a ring $a*a=0$ then $a+a=0$
$\int f(x) dx $ is appearing as $\int dx f(x)$. Why?
Finding the Laurent series of $f(z)=1/((z-1)(z-2))$
Showing that if the initial ideal of I is radical, then I is radical.
International Mathematical Olympiad 1988 Problem 6: canonical solution?
Limit: $\lim_{n\to \infty} \frac{n^5}{3^n}$
Primary ideals in Noetherian rings

Let $\mathbf{x}=[x_1,\ldots,x_K]$. I have the following optimization problem:

\begin{array}{rl}

\min \limits_{\mathbf{x}} & \| \mathbf{Ax}-\mathbf{b} \|^2 \\

\mbox{s.t.} & x_k\ge 0, \forall k

\end{array}

Please I need your help to solve this problem.

Another thing: my main problem was

\begin{array}{rl}

\min \limits_{\mathbf{x}} & \| \mathbf{A’x}-\mathbf{b’} \|^2 \\

\mbox{s.t.} & x_k\ge 0, \forall k & \\ & \mathbf{x}^T \mathbf{1}=1

\end{array}

Then I transformed it to the first problem by including the equality constraint in the objective function. Is it fine to do so ?

- What are the applications of convex sets?
- How to derive the proximal operator of the Euclidian norm?
- Why is it a requirement that we follow the central path in the interior point method?
- “Support function of a set” and supremum question.
- How to maximize an entropy function?
- A property of the minimum of a sum of convex functions

- Why does the non-negative matrix factorization problem non-convex?
- Orthogonal Projection onto the Unit Simplex
- Properties of the Cone of Positive Semidefinite Matrices
- Why is convexity more important than quasi-convexity in optimization?
- Proximal mapping of $f(U) = -\log \det(U)$
- How does the SVD solve the least squares problem?
- Proving an affine set's equivalence to the solution set of $Ax=b$
- Unique least square solutions
- “Support function of a set” and supremum question.
- Least squares and pseudo-inverse

The problem as written has no analytic solution for general $A$, $b$. (Yes, indeed, there are exceptions. If the solution of the same problem *with the inequalities removed* happens to produce a positive solution, then you’ve solved the original problem. But in the far more likely event that the inequality-free solution has negative entries, there is no analytic solution.)

In order to find the solution, you need either a quadratic programming engine or a second-order cone programming engine.

But before you run such a solver, you have to convert it to the standard form the solver expects. For instance, the QP formulation of your problem is

\begin{array}{ll}

\text{minimize}_{x} & x^T A^TA x – 2b^T x + b^T b \\

\text{subject to} & \vec{1}^T x = 1 \\

& x \geq 0

\end{array}

The second-order cone version is a bit more complex:

\begin{array}{ll}

\text{minimize}_{t,x} & t \\

\text{subject to} & \|A x – b \|_2 \leq t \\

& \vec{1}^T x = 1 \\

& x \geq 0

\end{array}

A modeling framework can help here, by eliminating the need to do this transformation yourself. For instance, the MATLAB package CVX can express your problem as follows:

```
cvx_begin
variable x(n)
minimize(sum_square(A*x-b))
subject to
sum(x) == 1
x >= 0
cvx_end
```

Disclaimer: I wrote CVX. But you certainly don’t have to use it. In MATLAB alone there are a wealth of other choices, including YALMIP and the QP solver in MATLAB’s own Optimization Toolbox.

Your problem can be conveniently re-written as

\begin{eqnarray}

\underset{x \in \mathbb{R}^K}{\text{min }}f(x) + g(x),

\end{eqnarray}

where $f: x \mapsto \frac{1}{2}\|Ax-b\|^2$ and $g = i_{\mathbb{R}^K_+}$, the indicator function (in the convex analytic sense) of the nonnegative $K$th orthant. $f$ is smooth with Lipschitz gradient ($\|A\|^2$ is a possible Lipschitz constant) while $g$ has a simple proximal operator $prox_g(x) := (x)_+$ (the orthogonal projector unto the aforementioned orthant). So, proximal methods like FISTA are your friend.

In your “main problem”, the aforementioned orthant is simply replaced with the standard simplex. The projector unto this simplex, though inaccessible in closed form, can be computed very cheaply using (for example) the simple algorithm presented in section 3 of the paper http://www.magicbroom.info/Papers/DuchiShSiCh08.pdf.

The code can be implemented in 3 lines of Python:

```
import numpy as np
def proj_simplex(v, z=1.):
"""Projects v unto the simplex {x >= 0, x_0 + x_1 + ... x_n = z}.
The method is John Duchi's O (n log n) Algorithm 1.
"""
# deterministic O(n log n)
u = np.sort(v)[::-1] # sort v in increasing order
aux = (np.cumsum(u) - z) / np.arange(1., len(v) + 1.)
return np.maximum(v - aux[np.nonzero(u > aux)[0][-1]], 0.)
```

**BTW, what is the proximal operator of an “arbitrary” convex function $g$ ?**

Formally,

\begin{eqnarray}

prox_g(x) := \underset{p \in \mathbb{R}^K}{\text{argmin }}\|p-x\|^2 + g(p).

\end{eqnarray}

“Proximable” functions (i.e functions for which the argmin problem in the definition above are easy to solve, for any point $x$) play just as important a rule as differentiable functions. The proximal operator lets you make “implicit gradient steps”. Indeed, one has the characterization

\begin{eqnarray}p = prox_g(x)\text{ iff } x – p \in \partial g(p),

\end{eqnarray} where \begin{eqnarray}\partial g(p) := \{u \in \mathbb{R}^K | g(q) \ge g(p) + \langle u, q – p\rangle \forall q \in \mathbb{R}^K\}\end{eqnarray} is the subdifferential of $g$ at $p$ (this reduces to the singleton $\{\nabla g(p)\}$ if $g$ is differentiable at $p$). In your problem(s) above, the proximal operator happens to be a projection operator. In fact for any closed convex subset $C \subseteq \mathbb{R}^K$, a little algebra reveals that

\begin{eqnarray}

prox_{i_C}(x) := \underset{p \in \mathbb{R}^K}{\text{argmin }}\|p-x\|^2 + i_C(p) = \underset{p \in C}{\text{argmin }}\|p-x\|^2 =: proj_C(x),

\end{eqnarray}

where $i_C$ is the indicator function of $C$ defined by $i_C(x) := 0$ if $x \in C$; $+\infty$ otherwise.

A less trivial example is the $\ell_1$-norm $\|.\|_1$ whose proximal operator (at rank $\gamma > 0$) is the so-called *soft-thresholding* operator $prox_{\gamma\|.\|_1}(x) = soft_\gamma(x) = (v_k)_{1\le k \le K}$, where

\begin{eqnarray}

v_k := \left(1- \dfrac{\gamma}{|x_k|}\right)_+x_k.

\end{eqnarray}

Proximal operators are a handy tool in modern convex analysis. They find great use in problems arising in signal processing, game theory, machine learning, etc. Here is a nice place to start learning about proximal operators and similar objects: http://arxiv.org/pdf/0912.3522.pdf.

Most importantly, “under mild conditions” one can show (see the previous reference) that a point $x^*$ minimizes $f + g$ iff

\begin{eqnarray}

x^* = prox_{\gamma g}(x^* – \gamma \nabla f(x^*)), \forall \gamma > 0

\end{eqnarray}

Thus the minimizers of $f + g$ coincide with the fixed-points of the operators $prox_{\gamma g}\circ(Id – \gamma \nabla f)$, $\gamma > 0$. This suggests the following algorithm, known as the *forward-backward* algorithm (Mureau; Lions and Mercier; P.L Combettes et al.)

\begin{eqnarray}

x^{(n+1)} = \underbrace{prox_{\gamma_n g}}_{\text{backward / prox

step}}\underbrace{(x^{(n)} – \gamma_n \nabla f(x^{(n)}))}_{\text{forward / gradient step}},

\end{eqnarray}

for an appropriately chosen sequence of step-sizes $(\gamma_n)_{n \in \mathbb{N}}$.

If $g$ is constant, so that it suffices to minimize $f$ alone, then the above iterates become

\begin{eqnarray}

x^{(n+1)} = x^{(n)} – \gamma_n \nabla f(x^{(n)}),

\end{eqnarray}

and we recognize our old friend, the *gradient descent* algorithm, taught in high school.

N.B.: $(x)_+$ denotes the componentwise maximum of $x$ and $0$.

First, the lagrangian function is

\begin{align}

\mathbf{L}(x,\alpha)=\Vert Ax-b \Vert^2 – \alpha^\mathsf{T} x

\end{align}

Then gradient of lagrangian vanishes at the optimal point:

\begin{align}

\frac{\partial \mathbf{L}}{\partial x} = 2 A^\mathsf{T}(Ax-b)-\alpha=0\\

2A^\mathsf{T}A x= \alpha+2A^\mathsf{T} b\\

\end{align}

If $A^\mathsf{T}A$ is not invertible we can just add a small regualrization term to it, i.e. replace it by $(2A^\mathsf{T}A + \mu I)$ or use moore penrose inverse.

\begin{align}

x=(2A^\mathsf{T}A)^\dagger \big(\alpha+2A^\mathsf{T} b\big)+x_0\\

\end{align}

which $x_0$ is a vector in null space of moore penrose inverse term and it vanishes.

The dual function is

\begin{align}

\max_{\alpha\geq 0 } \min_{x} \mathbf{L(x,\alpha)}

\end{align}

You can easily obtain by putting $x$ into the lagrangian. It’s a simple quadratic function with constraints $\alpha \geq 0$ which can be solved using ordinary quadratic solvers.

- 3 trams are coming every 10, 15 and 15 minutes. On average, how long do I have to wait for any tram to come?
- FLOSS tool to visualize 2- and 3-space matrix transformations
- Are values in a probability density function related to standard deviation?
- Problem involving many squares and several variables.
- How to find $\int_0^\infty\frac{\log\left(\frac{1+x^{4+\sqrt{15}}}{1+x^{2+\sqrt{3}}}\right)}{\left(1+x^2\right)\log x}\mathrm dx$
- Local vs global truncation error
- Could one be a friend of all?
- Closed form for $\int_0^{\infty}\sin(x^n)\mathbb{d}x$
- Proof of the Moreau decomposition property of proximal operators?
- Why is the complex plane shaped like it is?
- Find an infinite set of positive integers such that the sum of any two distinct elements has an even number of distinct prime factors
- What is a composition of two binary relations geometrically?
- Aggregate arrivals from a renewal process
- Three points on sides of equilateral triangle
- Density of a set. Exercise from Spivak.