Intereting Posts

difficulty understanding branch of the logarithm
What is the exact motivation for the Minkowski metric?
Does $a^3 + 2b^3 + 4c^3 = 6abc$ have solutions in $\mathbb{Q}$
Uniform distribution on $\mathbb Z$ or $\mathbb R$
Do $z^{3/4}=-1 $ solutions exist?
Let Q8 represent the group of quaternions and V4∼= Z/2Z ⊕ Z/2Z represent the Klein four-group.
Discriminant of a monic irreducible integer polynomial vs. discriminant of its splitting field
Reference for Gauss-Manin connection
When is $2^n \pm 1$ a perfect power
Equicontinuity if the sequence of derivatives is uniformly bounded.
Riemann Zeta Function Manipulation
How to tell whether a scheme is reduced from its functor of points?
Mean Value property for harmonic functions on regions other than balls/spheres
Conditions such that taking global sections of line bundles commutes with tensor product?
Does the little-oh relation remain if $f(x)$ and $g(x)$ both integrate or differentiate?

I am interested in deriving the unique solution to the following Quasi-Newton minimization problem

$$

\min_{H}|| H-H_k||\\

H=H^\top,\quad Hy_k =s_k

$$ The norm is the weighted Frobenius norm

$$

||A||_W \equiv ||W^{1/2}A W^{1/2}||_F

$$where the weight matrix $W$ is any positive definite matrix that satisfies $Ws_k=y_k$, and $||\cdot||_F$ is defined by $||C||^2_F= \sum_{i,j}^n c^2_{ij}$. The quantity $H$ is the inverse hessian which is symmetric, positive definite and satisfies the secant equation above, $Hy_k=s_k$. We can assume that $W=G^{-1}_k $ where $G_k$ the average hessian is given by

$$

G_k= \int_0^1 \nabla^2 f(x_k+\tau \alpha_k p_k)d\tau

$$

The unique solution is given by

$$

H_{k+1} = (1-\rho_k s_k y^\top_k)H_k (1-\rho_k y_k s^\top_k)+ \rho_k s_k s^\top_k

$$ where $\rho_k = (y^\top_k s_k)^{-1}$. Note, this is an iterative scheme where $k$ represents the current iteration and $H_{k+1}$ is an approximation to the inverse hessian.

The notation I am using is directly from the book Nocedal & Wright – Numerical Optimization. I am not able to find a full derivation of this anywhere, everything written above is all that Nocedal/Wright has in regards to this topic. For a reference, this is in chapter 6 – Quasi Newton Methods, of their newest/2nd edition.

All links I have tried to google and other books also have no full derivation. I am not able to find anything more thorough than Nocedal & Wright however they still don’t have a derivation. Thanks

- Rewriting repeated integer division with multiplication
- Execution time of function
- How to compute the Pareto Frontier, intuitively speaking?
- $ T(n)= T(\log n)+ \mathcal O(1) $ Recurrence Relation
- nth convolved Fibonacci numbers of order 6 modulo m
- How to count derangements?

- Does the integral $\int_{1}^{\infty} \sin(x\log x) \,\mathrm{d}x$ converge?
- How to calculate the integral $\int_0^\infty \exp\left(-\frac{x^2}{2\sigma^2}\right)x^{d-1}\,dx$
- Slick proofs that if $\sum\limits_{k=1}^\infty \frac{a_k}{k}$ converges then $\lim\limits_{n\to\infty} \frac{1}{n}\sum\limits_{k=1}^n a_k=0$
- How to obtain the equation of the projection/shadow of an ellipsoid into 2D plane?
- General Leibniz rule for triple products
- Finding the upper and lower limit of the following sequence.
- How to compute the Pareto Frontier, intuitively speaking?
- Prove ${\large\int}_0^1\frac{\ln(1+8x)}{x^{2/3}\,(1-x)^{2/3}\,(1+8x)^{1/3}}dx=\frac{\ln3}{\pi\sqrt3}\Gamma^3\!\left(\tfrac13\right)$
- What is the difference between stationary point and critical point in Calculus?
- Question about Euler's approach to find $\sum_{n=1}^\infty\frac1{n^2}=\frac{\pi^2}6$

I’ll outline the major steps. See the explanations and answers to the comments below.

- Introduce some notations

$$

\hat H=W^{1/2}HW^{1/2},\quad \hat H_k=W^{1/2}H_kW^{1/2},\quad \hat s_k=W^{1/2}s_k,\quad \hat y_k=W^{-1/2}y_k.\tag{1}

$$

Then the problem becomes

$$

\min\|\hat H_k-\hat H\|_F\quad\text{subject to }\hat H\hat y_k=\hat y_k\

(=\hat s_k).

$$ - To use the fact that $\hat y_k$ is the eigenvector of $\hat H$, let us introduce the new orthonormal basis

$$

U=[u\ |\ u_\bot]

$$

where $u$ is the normalized $\hat y_k$, i.e.

$$u=\frac{\hat y_k}{\|\hat y_k\|}\tag{2},$$ and $u_\bot$ is any ON-complement to $u$. Since the Frobenius norm is unitary invariant (as it is the sum of squares of the singular values) we have

$$

\|\hat H_k-\hat H\|_F=\|U^T\hat H_kU-U^T\hat HU\|_F=

\left\|\begin{bmatrix}\color{blue}* & \color{blue}*\\\color{blue}* & \color{red}*\end{bmatrix}-\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}*\end{bmatrix}\right\|_F.

$$

The blue part cannot be affected by optimization, and to minimize the Frobenius norm, it is clear that we should make the red part become zero, that is, the optimal solution satisfies

$$

\color{red}{u_\bot^T\hat Hu_\bot}=\color{red}{u_\bot^T\hat H_ku_\bot}.

$$

It gives the optimal solution to be

$$

\hat H=U\begin{bmatrix}\color{blue}1 & \color{blue}0\\\color{blue}0 & \color{red}{u_\bot^T\hat H_ku_\bot}\end{bmatrix}U^T.

$$ - To write it more explicitly

$$

\hat H=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}1 & 0\\0 & u_\bot^T\hat H_ku_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\hat H_ku_\bot u_\bot^T=

uu^T+(I-uu^T)\hat H_k(I-uu^T)

$$

where we used the following representation of the projection operator to the complement of $u$

$$

u_\bot u_\bot^T=I-uu^T.

$$ - Changing variables back to the original ones (1), (2) is straightforward.

**Step 1**. The convenience for $\hat H$ and $\hat H_k$ comes directly from the problem

$$

\min\|\underbrace{W^{1/2}H_kW^{1/2}}_{\hat H_k}-\underbrace{W^{1/2}HW^{1/2}}_{\hat H}\|_F.

$$

Then we have to rewrite the data too

\begin{align}

Hy_k=s_k\quad&\Leftrightarrow\quad \underbrace{\color{blue}{W^{1/2}}H\color{red}{W^{1/2}}}_{\hat H}\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}=\underbrace{\color{blue}{W^{1/2}}s_k}_{\hat s_k},\\

Ws_k=y_k\quad&\Leftrightarrow\quad \underbrace{\color{red}{W^{-1/2}}Ws_k}_{\hat s_k}=\underbrace{\color{red}{W^{-1/2}}y_k}_{\hat y_k}.

\end{align}

Thus, $\hat H\hat y_k=\hat y_k$. It is also equal to $\hat s_k$.

**Step 2**. Since $\hat Hu=u$ we know that $u^T\hat Hu=u^Tu=1$ and $u_\bot^THu=u_\bot^Tu=0$, so we can represent the optimizing variable as

$$

U^T\hat HU=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}=

\begin{bmatrix}u^T\hat Hu & u^T\hat Hu_\bot\\u_\bot^T\hat Hu & u_\bot^T\hat Hu_\bot\end{bmatrix}=

\begin{bmatrix}1 & 0\\0 & u_\bot^T\hat Hu_\bot\end{bmatrix}.

$$

It gives the following

\begin{align}

U^T\hat H_kU-U^T\hat HU&=\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H_k\begin{bmatrix}u & u_\bot\end{bmatrix}-\begin{bmatrix}u^T\\ u_\bot^T\end{bmatrix}\hat H\begin{bmatrix}u & u_\bot\end{bmatrix}=\\

&=\begin{bmatrix}\color{blue}{u^T\hat H_ku} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot}\end{bmatrix}-\begin{bmatrix}\color{blue}{1} & \color{blue}{0}\\\color{blue}{0} & \color{red}{u_\bot^T\hat Hu_\bot}\end{bmatrix}.

\end{align}

This particular structure of the optimizing variable was the whole idea to switch to the new basis. Because $\hat H$ has no freedom to vary in the blue part, we cannot change the corresponding blue part of $\hat H_k$, so it is fixed for all possible $\hat H$. The red part though can be changed as we wish, and the smallest Frobenius norm

\begin{align}

\|U^T(\hat H_k-\hat H)U\|_F^2&=

\left\|\begin{bmatrix}\color{blue}{u^T\hat H_ku-1} & \color{blue}{u^T\hat H_ku_\bot}\\\color{blue}{u_\bot^T\hat H_ku} & \color{red}{u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot}\end{bmatrix}\right\|_F^2=\\

&=\color{blue}{(u^T\hat H_ku-1)^2+\|u^T\hat H_ku_\bot\|_F^2+\|u_\bot^T\hat H_ku\|_F^2}+\color{red}{\|u_\bot^T\hat H_ku_\bot-u_\bot^T\hat Hu_\bot\|_F^2}

\end{align}

is obtained when the red part is zero.

**Step 3**. The matrix $U$ is orthogonal, hence,

$$

I=UU^T=\begin{bmatrix}u & u_\bot\end{bmatrix}\begin{bmatrix}u^T \\ u_\bot^T\end{bmatrix}=uu^T+u_\bot u_\bot^T\quad\Leftrightarrow\quad u_\bot u_\bot^T=I-uu^T.

$$

- What is the Hadamard's Factorization of a function that has a finite number of zeros
- Direct approach to the Closed Graph Theorem
- What does “extend linearly” mean in linear algebra?
- Relation between determinant and matrix rank
- In Bayesian Statistic how do you usually find out what is the distribution of the unknown?
- General solution to Wright-Fisher model – Diploid selection
- Ideas of finding counterexamples?
- $\mathbb{Q}/(X^2+Y^2-1)$ is integrally closed
- Can finding a left-inverse be formulated as a problem of choosing from a set?
- The trigonometric solution to the solvable DeMoivre quintic?
- A difficult logarithmic integral ${\Large\int}_0^1\log(x)\,\log(2+x)\,\log(1+x)\,\log\left(1+x^{-1}\right)dx$
- Convolution product on the linear dual of the polynomial algebra
- Minimum value of given expression
- Number $N>6$, such that $N-1$ and $N+1$ are primes and $N$ divides the sum of its divisors
- Do Boolean rings always have a unit element?