# Bayesian posterior with truncated normal prior

Suppose we observe one draw from the random variable $X$, which is distributed with normal distribution $\mathcal{N}(\mu,\sigma^2)$. The variance $\sigma^2$ is known, $\mu$ isn’t. We want to estimate $\mu$.

Suppose further that the prior distribution is given by truncated normal distribution $\mathcal{N}(\mu_0,\sigma^2_0,t)$, i.e., density $f(\mu)=c/\sigma \phi((\mu-\mu_0)/\sigma_0)$ if $\mu<t$, and $f(\mu)=0$ otherwise, where $t>\mu$ and $c$ is a normalizing constant. (Interpretation: we get noisy signals about $\mu$, which are known to be normally distributed with known variance—this is the draw of $X$. But we have prior knowledge that values $\mu\ge t$ are not possible.)

In this setup, is the resulting posterior a truncated normal distribution (truncated at $t$ like the prior)? I tried to adapt the derivation of the posterior for the well known conjugate normal pair (e.g., here and here), and it seems to work. Do you see any mistake in this derivation?

The likelihood function is given by
$$f(x|\mu)=\frac{1}{\sigma\sqrt{2\pi}} \exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} \right\}$$
The prior density is ($\Phi(.)$ is the cdf of the standard normal distribution)
$$f(\mu)=\begin{cases} \frac{1}{\sigma_0\sqrt{2\pi}\Phi((t-\mu_0)/\sigma_0)} \exp\left\{-\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\} &\text{ if } \mu\le t \\ 0 & \text{else}. \end{cases}$$
The prior density can be rewritten as
$$f(\mu)=c \phi((\mu-\mu_0)/\sigma_0)\mathbf{1}\{\mu<t\},$$
where $c$ is the normalizing constant (independent of $\mu$, but dependent on $t$). Now, by Bayes’ rule,

f(\mu|x)\propto f(x|\mu) f(\mu)\propto\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} \right\} \exp\left\{-\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\}\mathbf{1}\{\mu<t\} \\
=\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} -\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\}\mathbf{1}\{\mu<t\}\\
\propto \exp\left\{-\frac{1}{2\sigma^2\sigma_0^2/(\sigma^2+\sigma_0^2)} \left(\mu-\frac{\sigma^2\mu_0+\sigma_0^2 x}{\sigma^2+\sigma_0^2}\right)^2 \right\}\mathbf{1}\{\mu<t\}.

This is the kernel of the normal distribution with the usual mean and variance (as if we had done the derivation for an untruncated prior), but truncated at $t$ and above. In other words, ignoring the truncation in the prior distribution, using the usual learning rule for the conjugate normal pair, and then applying the truncation gives the same result as the derivation above (assuming it is correct). Is it correct? All I do is add the indicator function (and adapt the normalizing constant), does that introduce problems somewhere?

#### Solutions Collecting From Web of "Bayesian posterior with truncated normal prior"

$$f(\mu|x)\propto f(x|\mu) f(\mu)$$
Now suppose I came along and set a region of $f(\mu)$ to zero and scaled it by $c$ to renormalize it. For points of $\mu$ where it was not set to zero, the right-hand side of the above equation is the same except that we have to change $f(\mu) \to c f(\mu)$. Therefore, the left-hand side is also just scaled by $c$, but retains the exact shape of a normal distribution. So we end up with a scaled normal distribution, except of course for points where $f(\mu)$ is zero and the left hand side is also zero.
The new density is again truncated normal at $t$ with new parameters $\frac{\sigma^2\mu_0+\sigma_0^2 x}{\sigma^2+\sigma_0^2},\sigma^2\sigma_0^2/(\sigma^2+\sigma_0^2)$. The new normalising constant is $$\Phi\left( \frac{t-\frac{\sigma^2\mu_0+\sigma_0^2 x}{\sigma^2+\sigma_0^2}}{\sigma^2\sigma_0^2/(\sigma^2+\sigma_0^2)}\right )$$
It would be nice to study these posteriors for a fixed $t$ and different values of the prior parameters.