Intereting Posts

Why must the gradient vector always be directed in an increasing direction?
$|px^2+qx+r|\le1$ for all $x$ in $$, show that $|rx^2+qx+p|\le2$ for all $x$ in $$
A function that is $L^p$ for all $p$ but is not $L^\infty$?
How to define $A\uparrow B$ with a universal property as well as $A\oplus B$, $A\times B$, $A^B$ in category theory?
Order preserving bijection from $\mathbb Q\times \mathbb Q$ to $\mathbb Q$
A log improper integral
Is there a general formula for finding all subgroups of dihedral groups?
$\{a_n\}$ sequence $a_1=\sqrt{6}$ for $n \geq 1$ and $a_{n+1}=\sqrt{6+a_n}$ show it that convergence and also find $\lim_{x \to \infty} \{a_n\}$
Is there an infinite simple group with no element of order $2$?
wildly ramified extensions and $p$-power roots of unity
Is there an algebraic homomorphism between two Banach algebras which is not continuous?
How can I prove the convergence of a power-tower?
$q$-norm $\leq$ $p$-norm
limit $\lim\limits_{n\to\infty}\left(\sum\limits_{i=1}^{n}\frac{1}{\sqrt{i}} – 2\sqrt{n}\right)$
Topology on the space of paths

In the mathematical optimization literature it is common to distinguish problems according to whether or not they are convex. The reason seems to be that convex problems are guaranteed to have globally optimal solutions, so one can use methods such as gradient (steepest) descent to find such solutions, but I am not convinced.

For example, the function $|x|^{0.001}$ is **not** convex (see the yellow-shaded area in the picture below) but **it has a single global minimizer**.

- Ackermann Function primitive recursive
- Number of ways to partition a rectangle into n sub-rectangles
- Is learning haskell a bad thing for a beginner mathematician?
- Are axioms and rules of inference interchangeable?
- Find the Theta class for the recursion $T(n) = T(3n/4) + T(n/6) + 5n$
- Lower bound for finding second largest element

From the comments and answers I learned/recalled that the function above is **quasiconvex**. If my understanding is correct, and **quasiconvex** functions such as the one above have a single minimizer:

**Why is convexity more important than quasiconvexity in optimization?**Why do all optimization texts focus on convexity and not on quasiconvexity?

- A minimization problem in function fitting setup
- Derivation of soft thresholding operator
- How to maximize the number of operations in process
- Orthogonal Projection onto the $ {L}_{2} $ Unit Ball
- Reduction between languages in P
- Convert Equilateral triangle to Isosceles triangle
- The Practical Implication of P vs NP Problem
- What is necessary to exchange messages between aliens?
- Newton's method vs. gradient descent with exact line search
- How do you prove that $\{ Ax \mid x \geq 0 \}$ is closed?

There are many reasons why convexity is more important than quasi-convexity in optimization theory. I’d like to mention one that the other answers so far haven’t covered in detail. It is related to Rahul Narain’s comment that the class of quasi-convex functions is not closed under addition.

Duality theory makes heavy use of optimizing functions of the form $f+L$ over all linear functions $L$. If a function $f$ is convex, then for any linear $L$ the function $f+L$ is convex, and hence quasi-convex. I recommend proving the converse as an exercise: $f+L$ is quasi-convex for all linear functions $L$ if and only if $f$ is convex.

Thus, for every quasi-convex but non-convex function $f$ there is a linear function $L$ such that $f+L$ is not quasi-convex. I encourage you to construct an example of a quasi-convex function $f$ and a linear function $L$ such that $f+L$ has local minima which are not global minima.

Thus, in some sense convex functions are the class of functions for which the techniques used in duality theory apply.

Sophisticated optimization methods are developed to deal with problems where it is not obvious what the solution is. In general, we may not know much about the behavior of the cost function, so we cannot decide on an a-priori base whether there is a unique minimum or not. We need optimization methods to find the or a minimum.

For convex problems, the global behavior is determined by purely local properties. If a function is everywhere locally convex, it is globally convex too. So for such problems, local solutions are sufficient, which make them a lot easier to deal with.

For your first question, the answer is basically yes.

Gradient methods are usually descent methods, so the cost function decreases at points that are non-optimal points. Many such methods also have the property that any accumulation point of the sequence generated cannot have a non-zero gradient (call these ‘guaranteed’ descent methods). (This awkward wording allows the cost function to be non-differentiable at a minimum.)

Here is a sufficient condition that applies to the problem above: Let the initial point be $x_0$, and let $L_{x_0} = \{x | f(x) \leq f(x_0) \}$. Suppose $L_{x_0}$ is compact and the gradient is non-zero at all points other than the minimizer. Then any ‘guaranteed’ descent methods will converge to the minimizer. (I’m assuming that the cost function $f$ is $C^1$ at non-optimal points.) The problem you have above satifies these conditions.

The second question is more of a discussion topic. Some points to consider:

Duality is another big plus for convex problems. Dual problems are often more amenable to solutions.

Problems can often be transformed into convex problems, posynomials being a good example.

Often convexity of the level sets is the important criterion. Various generalizations (eg, pseudo-, and quasi-convexity) can extend the usefulness of convexity to broader settings.

Like someone else said, convex problems often have a dual, which lets an algorithm solve the problem from two sides and declare the solution when the duality gap is sufficiently close to 0 (when the lower and upper bound on the solution converge). There are also fairly efficient methods and algorithms that can solve canonical convex problems which do not extend easily to general quasiconvex problems.

For strict quasiconvex problems, a gradient descent method should be sufficient, but it’s a question of algorithmic efficiency. The algorithms for canonical convex problems are just much more efficient.

At least, that’s my impression of why there’s so much focus on convexity. However, one could certainly argue that quasiconvexity is often overlooked when it would be useful…

I’m not an expert, but just from eyeballing your function you can see that gradient descent would not converge to the global minimizer. Gradient descent looks for places where the gradient (which is just the derivative for 1-dimensional functions like the one you provide) is zero. In a convex function, as you approach the minimum, the gradient approaches zero (and vice versa). But you can see that, as you approach the minimum in your function, the derivative is getting *further and further* from zero.

If you try to do gradient descent on that function, you’ll actually move further and further away from the minimum, since the gradient is minimized as you approach (positive or negative) infinity.

So at least one reason convexity is so important in optimization is that the global minimum is also the unique critical point (place where the gradient is zero), which allows you to search for one by searching for the other. In some sense, you pick a point, look around a little bit, and move in the direction that makes your gradient smaller. Repeat this enough and you’ll get to a really small gradient, which corresponds to being close to the minimum.

- Generalizing Ramanujan's proof of Bertrand's Postulate: Can Ramanujan's approach be used to show a prime between $4x$ and $5x$ for $x \ge 3$
- Sum of combinations with varying $n$
- Small time behavior of Lévy processes
- Sanity check about Wikipedia definition of differentiable manifold as a locally ringed space
- Double sum and zeta function
- Find a polynomial of degree > 0 in $\mathbb Z_4$ that is a unit.
- Pullbacks of categories
- Where can I find the paper by Shafarevich on the result of the realization of solvable groups as Galois groups over $\mathbb{Q}$?
- Show $f^*dx_i = \sum_{j=1}^l \frac{\partial f_i}{\partial y_j} dy_j = df_i$
- Derivative of Associated Legendre polynomials at $x = \pm 1$
- If $p:E\to B$ is a covering space and $p^{-1}(x)$ is finite for all $x \in B$, show that $E$ is compact and Hausdorff iff $B$ is compact and Hausdorff
- Evaluating improper integrals using laplace transform
- Leibniz test $\sum\limits_{n=1}^\infty \sin\left(\pi \sqrt{n^2+\alpha}\right)$
- Pigeonhole Principle Question: Jessica the Combinatorics Student
- Show that quotient rings are not isomorphic