Intereting Posts

Suppose $|G| = 105$. Show that it is abelian if it has a normal $3$-Sylow subgroup.
Is $\mathbb R$ terminal among Archimedean fields?
How are the average rate of change and the instantaneous rate of change related for ƒ(x) = 2x + 5?
Holomorphic Poincaré conjecture
Generalisation of euclidean domains
I have simple multiplication confusion, any help would be great
Correct way to calculate numeric derivative in discrete time?
Transpose of a linear mapping
Dense subset of the plane that intersects every rational line at precisely one point?
$\frac{\partial f_i}{x_j}=\frac{\partial f_j}{x_i}\implies(f_1,\ldots,f_n)$ is a gradient
$x^2+1=0$ uncountable many solutions
Calculation of the n-th central moment of the normal distribution $\mathcal{N}(\mu,\sigma^2)$
Index notation.
What function satisfies $F'(x) = F(2x)$?
Why does dividing a vector by its norm give a unit vector?

Can someone explain probability measure in simple words? This term has been hunting me for my life.

Today I came across **Kullback-Leibler divergence**. The KL divergence between **probability measure** P and Q is defined by,

$$KL(P,Q)= \begin{cases}

\int \log\left(\frac{dP} {dQ}\right)dP & \text{if}\ P\ll Q, \\

\infty & \text{otherwise}.

\end{cases}$$

- Constructor theory distinguishability
- What is the least amount of questions to find out the number that a person is thinking between 1 to 1000 when they are allowed to lie at most once
- Has error correction been “solved”?
- What does the $-\log$ term mean in the calculation of entropy?
- Connection between Boltzmann entropy and Kolmogorov entropy
- Expanding and understanding the poison pills riddle

I have no idea what I just read. I looked up *probability measure*, it refers to *probability space*. I looked that up, it refers to $\sigma$-algebra. I told myself I have to stop.

So, is probability measure just a probability density but a broader and fancier saying. Am I overlooking a simple concept, or is this topic just that hard? Thanks in advance!

- Show that $C_1= [\frac{k}{2^n},\frac{k+1}{2^n})$ generates the Borel σ-algebra on R.
- Under what condition can converge in $L^1$ imply converge a.e.?
- Convergence in measure implies convergence in $L^p$ under the hypothesis of domination
- pointwise convergence in $\sigma$-algebra
- Composition of measurable function and continuous function
- Do these $\sigma$-algebras on second countable spaces coincide?
- A function which is continuous in one variable and measurable in other is jointly measurable
- Is the set $P^{-1}(\{0\})$ a set of measure zero for any multivariate polynomial?
- I want to show that $|f(x)|\le(Mf)(x)|$ at every Lebesgue point of $f$ if $f\in L^1(R^k)$
- $\int_X f(x)\,d\mu\,\,$ exists iff $\,\,\int_X \lvert \,f(x)\rvert\,d\mu\,\,$ does

A probability space consists of:

- A sample space $X$, which is the set of all possible outcomes of an experiment
- A collection of events $\Sigma$, which are subsets of $X$
- A function $\mu$, called a probability measure, that assigns to each event in $\Sigma$ a nonnegative real number

Let’s consider the simple example of flipping a coin. In that case, we have $X=\{H,T\}$ for heads and tails respectively, $\Sigma=\{\varnothing,\{H\},\{T\},X\}$, and $\mu(\varnothing)=0$, $\mu(\{H\})=\mu(\{T\})=\frac{1}{2},$ and $\mu(X)=1$. All of this is a fancy way of saying that when I flip a coin, I have a $0$ percent chance of flipping *nothing*, a $50$ percent chance of flipping heads, a $50$ percent chance of flipping tails, and a $100$ percent chance of flipping *something*, heads or tails. This is all very intuitive.

Now, getting back to the abstract definition, there are certain natural requirements that $\Sigma$ and $\mu$ must satisfy. For example, it is natural to require that $\varnothing$ and $X$ are elements of $\Sigma$, and that $\mu(\varnothing)=0$ and $\mu(X)=1$. This is just saying that when performing an experiment, the probability that *no* outcome occurs is $0$, while the probability that *some* outcome occurs is $1$.

Similarly, it is natural to require that $\Sigma$ is closed under complements, and if $E\in\Sigma$ is an event, then $\mu(E^c)+\mu(E)=1$. This is just saying that when performing an experiment, the probability that event $E$ occurs or doesn’t occur must be $1$.

There are other requirements of $\Sigma$ which make it a $\sigma$-algebra, and other requirements of $\mu$ which make it a (finite) measure, and to rigorously study probability, one must eventually become familiar with these notions.

A probability measure is more like a cumulative distribution function.

It gives, for any *set* of values, the probability of the random variable being in that set. And of course, it has to be defined in a way that makes sense: if $A \cap B = \emptyset$, then $\mu(A \cup B) = \mu(A) + \mu(B)$, and the probability of the entire range is one, and no set has a negative probability.

Agreed that wikipedia does a poor job getting the basic ideas across; it seems to be written by experts for experts and very jargon-y in many cases….

Pictorially, perhaps picture that you have many items, and a probability measure is a scale telling you the weight of any subset. The total weight of everything you have is always one. If you put a couple items on the scale separately one by one, the sum of their weights will be the same as if you weighed them all together at once.

A funny thing happens with grains of sand: They each have individual weight zero, but when you get a jar of them together (think *uncountably* many, that’s important!), then they can have a total weight bigger than zero.

Think of grains of sand here as being uncountably many in total, like real numbers in an interval. The above is *not* true if there are only countably many! But for real numbers, for instance, each number in the interval has probability measure zero, but the whole interval has some positive measure.

Perhaps I can help clarify things a bit without getting super technical.

A probability space is simply the collection of all the possible events that can happen. So, if you are flipping a coin, the probability space $\Omega = \{H, T\}$ since you can only flip heads or tails. The $\sigma$-algebra that was mentioned is also conceptually simple – it groups all of the events in your probability space into a new set (of course, a $\sigma$-algebra has certain properties, but for sake of simplicity I am skipping those). Therefore, an example of a $\sigma$-algebra $F$ on $\Omega$ would be the power set of $\Omega$ (set of all subsets) $F = \{\emptyset , \{H\}, \{T\}, \{H,T\} \}$.

The reasons the $\sigma$-algebra is important is because that is the set of events that a probability measure gives weights to. Therefore, a measurable space $(\Omega, F, P)$ is a probability space, combined with a $\sigma$-algebra on that space, and a probability measure P on the $\sigma$-algebra.

So, a probability measure simply gives weights (probabilities) to each set within the $\sigma$-algebra, where all of these weights must add up to 1, and a few other properties (cumulative additivity for example).

To describe a random variable $X$, we specify what the probaility is that the outcome of $X$ is some value $x$. For example with a fair die and $X$ standing for “the score of one roll of the die”, we’d say $$P(X=1)=P(X=2)=P(X=3)=P(X=4)=P(X=5)=P(X=6)=\frac16$$ and that’s it.

Our $X$ takes values only from the finite set $\Omega=\{1,2,3,4,5,6\}$.

There are also random variables with (countably) infinitely many possible outcomes. For example if $Y$ stands for “the number of throws of a fair coin until head appears the first time, then

$$P(Y=1)=\frac12, P(Y=2)=\frac14, P(Y=3)=\frac18,\ldots $$

The set $\Omega$ of possible outcomes is now $\Omega=\mathbb N$.

And finally there are random variables with uncountably many possible outcomes (e.g. let $Z$ stand for “select a random point uniformly on the unit interval $\Omega:=[0,1]$”). In these cases *usually* for any individual value $x\in\Omega$, the probaility $P(Z=x)$ is simply zero. Instead, we have positive probability only if we ask for certain infinite subsets of the space $\Omega$ of possible outcomes. For example, we can righteously say $P(\frac12< X<\frac23)=\frac16$.

It would be nice if one could assign a probability value to *any* subset $S\subseteq \Omega$. However, it usually turns out that this is not possible in a consistent or well-defined manner.

One will still strive to make the collection of sets $S$ for which $P(X\in S)$ is defined/definable as large as possible.

For our example $Z$, we can certainly say $P(X\in S)=b-a$ if $S$ is an interval $[a,b]$ or $]a,b[$ or $]a,b]$ or $[a,b[$ with $0\le a\le b\le 1$. Especially, $P(X\in\emptyset)=0$ and $P(X\in\Omega)=1$.

Also, if $A,B$ are disjoint and $P(X\in A)$ and $P(X\in B)$ make sense, then so does $P(X\in A\cup B)$, namely with the value $P(X\in A\cup B)=P(X\in A)+P(X\in B)$. In fact, if we have sets $A_1,A_2,\ldots$ and know $P(X\in A_n)$ for each $n$, then it turns out to be advisable to have

$$P(\bigcup_{n=1}^\infty A_n)=\sum_{n=1}^\infty P(X\in A_n).$$

This is almost the concept of a $\sigma$-algebra: It is a collection of subsets of a given set $\Omega$. If we are lucky, such as in the finite case or the countable case (at least as it occured with the random variable $Y$ we defined) this collection is the full powerset of $\Omega$, but it may be smaller.

At any rate, it is large enough to be closed under certain operations, among which is the *countable* union of sets.

And this property is precisely what allows us to formulate the essential properties we want to have for probabilities of a random variable being in a subset of $\Omega$. Any function that assigns to each element of a given $\sigma$-algebra (i.e. to each sufficiently nice subset of $\Omega$) a value between $0$ and $1$ inclusive, such that the basic rules as spelled out above hold for countable unions, complements, the whole space, is then called a *probability measure*.

One important measure is the Lebesgue measure $\lambda$ on $[0,1]$ (which describes the random variable $Z$ above).

You may know it from integration theory, where it allwows us to generalize (extend) the Riemann integration.

You may know for example, that the expected value of a finite random variable is simply given by

$$\tag1E(X) = \sum_{x\in\Omega}x\cdot P(X=x) $$

or more generally, the expected avalue of a function of $X$

$$\tag2E(f(X))\sum_{x\in\Omega}f(x)\cdot P(X=x).$$

These are just finite sums (hence always work) if $X$ is a finite random variable. If $\Omega$ is countable, we can use the same formulas, but have *series* instad of sums, and it may happen that the series does not converge.

For example $E(Y)=2$, but $E((-2)^Y)$ does not converge.

It becomes even worse when $P(X=x)=0$ for all $x\in\Omega$ as then the sums/series above simply result in $0$. The sums/series are simply replaced with corresponding integrals

$$E(Z)=\int_0^1 x\,\mathrm dx =\frac12, \qquad E(f(Z))=\int_0^1 f(x)\,\mathrm dx.$$

Again, the second integral does not make sense for *every* possible $f$, it must be integrable.

The step from sum to (first series and then) integral may look arbitrary, but it is indeed well-founded in measure theory – often enough one adjusts in the other direction and also writes series and sums as integrals (with respect to specific measures).

All this may still not be enough to grasp the formula you posted, but it should help you get started with the introductory texts you already tried to read.

- intersection of two curves in a square
- Closed Form Expression of sum with binomial coefficient
- How to understand the Todd class?
- Why are ordered spaces normal?
- Finding all homomorphisms between two groups – couple of questions
- Complete convergence is equivalent to convergence a.s. under independence
- Closed form for $\sum_{n=-\infty}^\infty \frac{1}{(z+n)^2+a^2}$
- Prove or disprove: $(\mathbb{Q}, +)$ is isomorphic to $(\mathbb{Z} \times \mathbb{Z}, +)$?
- why the square root of x equals x to the one half power
- Useless math that become useful
- Examples for proof of geometric vs. algebraic multiplicity
- Locally compact nonarchimedian fields
- Need help to prove
- Show that $f(2n)= f(n+1)^2 – f(n-1)^2$
- The group of invertible fractional ideals of a Noetherian domain of dimension 1