Why is this weighted least squares cost function a function of weights?

Here is a picture from my book regarding weighted least squares:

enter image description here

Totally lost here, so I extracted the main nested issues confusing me:

  • First Question: I know that in any LSE we want to minimize the cost function. The cost function is a function of the variable, $x$. In WLSE, I (thought?) the weights were set before hand, and dont change. So my first question is, why has he made the cost function $g(w)$ here, a function of the weights $w$? Shouldn’t it purely be a function of $x$? Thats the value we are minimizing over after all.

  • Second question: He says that we allow negative weights, (again, I thought they were set before hand), and in doing so, we allow “the objective function to be unbounded from below”. What does this mean? If it means that $g(w)$ can take on the value of $-\infty$, why is that a bad thing?

  • Third question: Probably related to the second one, but why is it important that the domain of $g(w)$ be $w$’s that make the $g(w) > -\infty$?

Solutions Collecting From Web of "Why is this weighted least squares cost function a function of weights?"

I think you need to broaden your definition of a “function” a bit.

When you fix a weight vector $w$ and solve the resulting weighted least squares problem, you will obtain a particular value of the cost function $\inf_x \sum_i w_i(a_i^Tx-b_i)^2$. Let’s call that value $g$. Sure, you’ll also get a vector $x$ out of it, but just ignore it for now. Now, if you choose a different value of $w$, you will get a different value of $g$. A third value of $w$, a third value of $g$. And so on, and so forth.

If you want to, you could consider assembling an uncountably infinite list of all possible weight vectors $w$ and their associated values $g$. What is a function but a compact mathematical representation of such a list? Let us define the function $g:\mathbb{R}^n\rightarrow\mathbb{R}$ in this way: to compute $g(w)$, fix the value of $w$, solve the weighted least squares problem, and return the resulting value of the cost.

That’s exactly what is happening here. The particular value of $x$ you may get for any such $w$ is now irrelevant; the focus is on the weight vector $w$ and the cost $g(w)$ that it obtains.

Second question: When they say that they “allow the objective function to be unbounded below” they simply mean that, for some values of $w$ with negative entries, $\inf_x w_i(a_i^Tx-b_i)=-\infty$. Of course, you lose the “least squares” interpretation when you have negative weights, but as a purely mathematical construct, there’s nothing wrong with doing that. There is nothing inherently bad about allowing a function to obtain the value $-\infty$, as long as you define things properly. And this book (Convex Optimization by Boyd & Vandenberghe) introduces the notion of extended-valued functions earlier.

Third question: it is by definition that the domain of an extended-valued function is the set of points for which the function is finite. Why is that important? Remember, infinity is not an actual number: it’s a concept, a construct. In this context, it is best thought of as a placeholder: $g(w)=-\infty$ effectively means “the function is not well defined at this point.”

$g\left(\omega\right)$ is not the cost function here. $g\left(\omega\right)$ is the infimal value of this special cost function depending on omega: $\sum_{i}\omega_{i}\left(a_{i}^{T}x-b_{i}\right)^{2}$.

This defines a mapping $g:\mathbb{R}^{n}\rightarrow\mathbb{R}\cup\left\{-\infty\right\}$.

The set of all points where $g$ is not infinity, we call $dom\left(g\right)$, i.e. $dom\left(g\right)=\left\{\omega\in\mathbb{R}^{n}|g\left(\omega\right)\in\mathbb{R}\right\}$

I hope this helps. What it’s actually useful for is hard to state, depends on what the book aims at. Most probably they will later chose a “proper” cost function $\sum_{i}\omega_{i}\left(a_{i}^{T}x-b_{i}\right)^{2}$ with $\omega\in dom\left(g\right)$ and minimize that one…