Why the expected value of the error when doing regression by OLS is 0?

What I know to begin with is that the sum will be 0 if there is a y-intercept b0 , why is that? my book doesnt say and can’t figure it out. I also know that an importantant assumption for the OLS estimators to be BlUE is that x and the erros can’t be corralated otherwise the estimators would be biased. This correlation assumption may entail problems so we take one step further and we assert that by combining E(u)=0 ( an assumption that the book doesnt explain where it comes from) + E(u|x) – the average value of u doesnt depend on the value of x) , we get E(u)=0=E(u|x). so this assumption is somehow related to that the sum of residuals should be 0. But how do I mathematically prove that?

Sorry if I have been a bit unclear but I am a quite confused, it seems all this topic quite redundant to me , like of the assumptions that jusify the method are infered from the method itself..

Solutions Collecting From Web of "Why the expected value of the error when doing regression by OLS is 0?"

I have answered the other question of yours about why the OLS residuals are zero, when an intercept is included in the regression.

The stochastic assumptions on the error term, (not on the residuals) $E(u) = 0$ or $E(u\mid X) = 0$ assumption (depending on whether you treat the regressors as deterministic or stochastic) are in fact justified by the same action that guarantees that the OLS residuals will be zero: by including in the regression a constant term (“intercept”).

Consider the specification

$$y_i = b_0 + b_1x_i + u_i\,,\; E(u) = 0$$

Consider the alternative specification

$$y_i = b_1x_i + \varepsilon_i\,,\; E(\varepsilon_i) = b_0 $$

The expected value of $y$ in these two different specifications is the same -and it is the expected value of $y$ that we are essentially estimating in an OLS regression(see this post).

In other words, the constant term plays the role of, (or absorbs together with other constants coming from the theory behind the model) the non-zero mean of the error term, and permits us to assume safely that the “remaining” error-term has a zero mean (and this is why it is strongly advised from every corner to always run a regression including a constant term).

And then, it also gives us that the sum of OLS residuals will be zero. What more to ask from a constant?