Why is convexity more important than quasi-convexity in optimization?

In the mathematical optimization literature it is common to distinguish problems according to whether or not they are convex. The reason seems to be that convex problems are guaranteed to have globally optimal solutions, so one can use methods such as gradient (steepest) descent to find such solutions, but I am not convinced.

For example, the function $|x|^{0.001}$ is not convex (see the yellow-shaded area in the picture below) but it has a single global minimizer.

                      enter image description here


From the comments and answers I learned/recalled that the function above is quasiconvex. If my understanding is correct, and quasiconvex functions such as the one above have a single minimizer:

  • Why is convexity more important than quasiconvexity in optimization? Why do all optimization texts focus on convexity and not on quasiconvexity?

Solutions Collecting From Web of "Why is convexity more important than quasi-convexity in optimization?"

There are many reasons why convexity is more important than quasi-convexity in optimization theory. I’d like to mention one that the other answers so far haven’t covered in detail. It is related to Rahul Narain’s comment that the class of quasi-convex functions is not closed under addition.

Duality theory makes heavy use of optimizing functions of the form $f+L$ over all linear functions $L$. If a function $f$ is convex, then for any linear $L$ the function $f+L$ is convex, and hence quasi-convex. I recommend proving the converse as an exercise: $f+L$ is quasi-convex for all linear functions $L$ if and only if $f$ is convex.

Thus, for every quasi-convex but non-convex function $f$ there is a linear function $L$ such that $f+L$ is not quasi-convex. I encourage you to construct an example of a quasi-convex function $f$ and a linear function $L$ such that $f+L$ has local minima which are not global minima.

Thus, in some sense convex functions are the class of functions for which the techniques used in duality theory apply.

Sophisticated optimization methods are developed to deal with problems where it is not obvious what the solution is. In general, we may not know much about the behavior of the cost function, so we cannot decide on an a-priori base whether there is a unique minimum or not. We need optimization methods to find the or a minimum.

For convex problems, the global behavior is determined by purely local properties. If a function is everywhere locally convex, it is globally convex too. So for such problems, local solutions are sufficient, which make them a lot easier to deal with.

For your first question, the answer is basically yes.

Gradient methods are usually descent methods, so the cost function decreases at points that are non-optimal points. Many such methods also have the property that any accumulation point of the sequence generated cannot have a non-zero gradient (call these ‘guaranteed’ descent methods). (This awkward wording allows the cost function to be non-differentiable at a minimum.)

Here is a sufficient condition that applies to the problem above: Let the initial point be $x_0$, and let $L_{x_0} = \{x | f(x) \leq f(x_0) \}$. Suppose $L_{x_0}$ is compact and the gradient is non-zero at all points other than the minimizer. Then any ‘guaranteed’ descent methods will converge to the minimizer. (I’m assuming that the cost function $f$ is $C^1$ at non-optimal points.) The problem you have above satifies these conditions.

The second question is more of a discussion topic. Some points to consider:

Duality is another big plus for convex problems. Dual problems are often more amenable to solutions.

Problems can often be transformed into convex problems, posynomials being a good example.

Often convexity of the level sets is the important criterion. Various generalizations (eg, pseudo-, and quasi-convexity) can extend the usefulness of convexity to broader settings.

Like someone else said, convex problems often have a dual, which lets an algorithm solve the problem from two sides and declare the solution when the duality gap is sufficiently close to 0 (when the lower and upper bound on the solution converge). There are also fairly efficient methods and algorithms that can solve canonical convex problems which do not extend easily to general quasiconvex problems.

For strict quasiconvex problems, a gradient descent method should be sufficient, but it’s a question of algorithmic efficiency. The algorithms for canonical convex problems are just much more efficient.

At least, that’s my impression of why there’s so much focus on convexity. However, one could certainly argue that quasiconvexity is often overlooked when it would be useful…

I’m not an expert, but just from eyeballing your function you can see that gradient descent would not converge to the global minimizer. Gradient descent looks for places where the gradient (which is just the derivative for 1-dimensional functions like the one you provide) is zero. In a convex function, as you approach the minimum, the gradient approaches zero (and vice versa). But you can see that, as you approach the minimum in your function, the derivative is getting further and further from zero.

If you try to do gradient descent on that function, you’ll actually move further and further away from the minimum, since the gradient is minimized as you approach (positive or negative) infinity.

So at least one reason convexity is so important in optimization is that the global minimum is also the unique critical point (place where the gradient is zero), which allows you to search for one by searching for the other. In some sense, you pick a point, look around a little bit, and move in the direction that makes your gradient smaller. Repeat this enough and you’ll get to a really small gradient, which corresponds to being close to the minimum.