I have been studying some of the interior point methods recently, and there is one in particular that I find most intuitive: The path-following method method. In this method, we put the inequality constraints into the objective function, and vary $\mu$ from $\infty$ to $0$. This variation allows us to follow what is called the “central path”.
But from my point of view any variation in $\mu$ towards 0 will allow us to reach the optimum eventually. So, why is it si important that we follow the special “central path” instead of just following some other route to the optimum?
EDIT: it is maybe true that I am missing understanding some of the properties of the central path. Some of the properties are covered here on slide 21: https://web.stanford.edu/class/msande310/lecture11.pdf. If someone can understand these better than me, then maybe I can see why following the central path is so important here.
SECOND EDIT: I am talking about $\mu$ in the sense of a modified objective function (usually called a barrier objective function) like this: $b^Ty+\mu\Sigma log(x_j)$ s.t. $x_j \ge 0$. Just as it shows in the link I posted.
Related question that might benefit from this: primal-dual interior-point method picking value of $\mu$
Let us look at a small example:
$$
\begin{aligned}
\min_x & \quad c^T x \\
\text{s.t.} &\quad e^T x \leq 1 \\
&\quad x_i \geq 0
\end{aligned}
$$
where $e = (1, \dots, 1)^T$ and $c$ is some arbitrary vector in $\mathbb{R}^n$.
Now, let us look at the very simple case of a primal interior point method. The reason I am simplifying everything is because I want to demonstrate a point.
For a given $\mu$, the objective is to minimize
$$
f(x) = c^T x – \frac{1}{\mu} \log(1 – \mathrm{e}^T x ) – \frac{1}{\mu}\sum_{i=1}^n \log(-x_i)
$$
Let us see how this functions is minimized with Newton’s method. The negative gradient is
$$\nabla f(x) = \frac{1}{\mu} \underbrace{\left[ \mu c – \frac{1}{(1 – e^T x)} e – \left(\frac{1}{x_1}, \dots, \frac{1}{x_n}\right)^T \right]}_{g(x)}$$
The Hessian is
$$
\nabla^2 f(x) = \frac{1}{\mu} \underbrace{\left[\frac{1}{\mu(1 – e^T x)^2} e e^T + \operatorname{diag}\left(\frac{1}{x_1^2}, \dots, \frac{1}{x_n^2}\right)
\right]}_{H(x)}
$$
In each iteration of Newton’s method, we find the descent direction $d$ by solving the system
$$
\nabla^2 f(x) d = -\nabla f(x)
$$
or, equivalently,
$$
H(x) d = -g(x)
$$
Let us analyze some phenomena.
Case 1: $x$ is close to the boundary of the feasible set, and $\mu$ is moderate. In that case, some of the values $\{1 – e^T x, x_1, \dots, x_n\}$ are close to zero, while others are not. Thus, the matrix $H(x)$ becomes very ill-conditioned, and, in addition, some of the entries of $g(x)$ might be very small while others are very large. Both phenomena destroy any hope of accurately solving the Newton system.
Case 2: $x$ is far from the boundary of the feasible set, and $\mu$ is large. In that case, all of the values in the set $\{1 – e^T x, x_1, \dots, x_n\}$ are far away from zero, but the entries of $\mu c$ are very large. Thus, when computing $g(x)$ we add numbers of different orders of magnitude, causing serious round off errors. The same phenomena happen when computing $H(x)$. Thus, both $g(x)$ and $H(x)$ are inaccurate, and therefore our Newton direction $d$ is inaccurate.
Case 3: $x$ is far to the boundary, and $\mu$ is moderate. In that case, for the same reasoning above, both $g(x)$ and $H(x)$ are computed quite accurately, without entries of substantially different orders of magnitude. In addition $H(x)$ is not ill-conditioned, and we can find $d$ quite accurately.
Case 4: $x$ is close to the boundary, and $\mu$ is very large.The entries of $\mu c$ are become very large, some of the values in $\{1 – e^T x, x_1, \dots, x_n\}$ are close to zero while others are not. Thus, it seems that we are, again, adding numbers of different orders of magnitude. However, in this case it works in our favor, since near the boundary we should follow a direction that is similar to $c$, and thus making the rest of the terms when computing $g(x)$ negligible is not bad. The matrix $H(x)$ can be ill-conditioned and inaccurately computed. However, looking closer we observe that $g(x)$ is a sum of $\mu c$ and the dominant Eigenvectors of $H(x)$, and thus the Newton system is solved quite accurately.
Following the central path ensures that we always fall in cases 3 and 4.