The following probability question appeared in an earlier thread:
I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?
The claim was that it is not actually a mathematical problem and it is only a language problem.
If one wanted to restate this problem formally the obvious way would be like so:
Definition: Sex is defined as an element of the set $\\{\text{boy},\text{girl}\\}$.
Definition: Birthday is defined as an element of the set $\\{\text{Monday},\text{Tuesday},\text{Wednesday},\text{Thursday},\text{Friday},\text{Saturday},\text{Sunday}\\}$
Definition: A Child is defined to be an ordered pair: (sex $\times$ birthday).
Let $(x,y)$ be a pair of children,
Define an auxiliary predicate $H(s,b) :\\!\\!\iff s = \text{boy} \text{ and } b = \text{Tuesday}$.
Calculate $P(x \text{ is a boy and } y \text{ is a boy}|H(x) \text{ or } H(y))$
I don’t see any other sensible way to formalize this question.
To actually solve this problem now requires no thought (infact it is thinking which leads us to guess incorrect answers), we just compute
$$
\begin{align*}
& P(x \text{ is a boy and } y \text{ is a boy}|H(x) \text{ or } H(y)) \\\\
=& \frac{P(x\text{ is a boy and }y\text{ is a boy and }(H(x)\text{ or }H(y)))}
{P(H(x)\text{ or }H(y))} \\\\
=& \frac{P((x\text{ is a boy and }y\text{ is a boy and }H(x))\text{ or }(x\text{ is a boy and }y\text{ is a boy and }H(y)))}
{P(H(x)) + P(H(y)) – P(H(x))P(H(y))} \\\\
=& \frac{\begin{align*} &P(x\text{ is a boy and }y\text{ is a boy and }x\text{ born on Tuesday}) \\\\
+ &P(x\text{ is a boy and }y\text{ is a boy and }y\text{ born on Tuesday}) \\\\
– &P(x\text{ is a boy and }y\text{ is a boy and }x\text{ born on Tuesday and }y\text{ born on Tuesday}) \\\\
\end{align*}}
{P(H(x)) + P(H(y)) – P(H(x))P(H(y))} \\\\
=& \frac{1/2 \cdot 1/2 \cdot 1/7 + 1/2 \cdot 1/2 \cdot 1/7 – 1/2 \cdot 1/2 \cdot 1/7 \cdot 1/7}
{1/2 \cdot 1/7 + 1/2 \cdot 1/7 – 1/2 \cdot 1/7 \cdot 1/2 \cdot 1/7} \\\\
=& 13/27
\end{align*}
$$
Now what I am wondering is, does this refute the claim that this puzzle is just a language problem or add to it? Was there a lot of room for misinterpreting the questions which I just missed?
There are even trickier aspects to this question. For example, what is the strategy of the guy telling you about his family? If he always mentions a boy first and not a daughter, we get one probability; if he talks about the sex of the first born child, we get a different probability. Your calculation makes a choice in this issue – you choose the version of “if the father has a boy and a girl, he’ll mention the boy”.
What I’m aiming to is this: the question is not well-defined mathematically. It has several possible interpretations, and as such the “problem” here is indeed of the language; or more correctly, the fact that a simple statement in English does not convey enough information to specify the precise model for the problem.
Let’s look at a simplified version without days. The probability space for the make-up of the family is {BB, GB, BG, GG} (GB means “an older girl and a small boy”, etc). We want to know what is $P(BB|A)$ where A is determined by the way we interpret the statement about the boys. Now let’s look at different possible interpretations.
1) If there is a boy in the family, the statement will mention him. In this case A={BB,BG,GB} and so the probability is $1/3$.
2) If there is a girl in the family, the statement will mention her. In this case, since the statement talked about a boy, there are NO girls in the family. So A={BB} and so the probability is 1.
3) The statement talks about the sex of the firstborn. In this case A={BB,BG} and so the probability is $1/2$.
The bottom line: The statement about the family looks “constant” to us, but it must be looked as a function from the random state of the family – and there are several different possible functions, from which you must choose one otherwise no probabilistic analysis of the situation will make sense.
It is actually impossible to have a unique and unambiguous answer to the puzzle without explicitly articulating a probability model for how the information on gender and birthday is generated. The reason is that (1) for the problem to have a unique answer some random process is required, and (2) the answer is a function of which random model is used.
The problem assumes that a unique probability can be deduced as the answer. This requires that the set of children described is chosen by a random process, otherwise the number of boys is a deterministic quantity and the probability would be 0 or 1 but with no ability to determine which is the case. More generally one can consider random processes that produce the complete set of information referenced in the problem: choose a parent, then choose what to reveal about the number, gender, and birth days of its children.
The answer depends on which random process is used. If the Tuesday birth is disclosed only when there are two boys, the probability of two boys is 1. If Tuesday birth is disclosed only when there is a sister, the probability of two boys is 0. The answer could be any number between 0 or 1 depending on what process is assumed to produce the data.
There is also a linguistic question of how to interpret “one is a boy born on Tuesday”. It could mean that the number of Tuesday-born males is exactly one, or at least one child.
I guess the following two versions of framing the question yield two different probabilities:
Dave has two children. Is atleast one of them a boy who is born on Tuesday? Dave answers Yes.
Dave has two children. I ask him to first choose and fix one child at random, and tell me if it is a boy who was born on Tuesday. Dave answers yes he is a boy born on Tuesday.
For 1st the probability (of both being boys) is 13/27, while for the second the probability is 1/2.
The way in which the question is asked, it’s in line with 1st, hence the answer should be 13/27.
There is always room for misinterpreting a question when one does not fully understand the language in which it is written. I think that the way mathematics and mathematicians use conditional probability is clear:
$$P(A|B)=P(A \cap B)/P(B).$$
So I believe that this is the interpretation that one should take, and thus arrive at your answer of 13/27, and not search for further nuances, which are not too difficult to find.
Well, given the unstated assumption that the writer is a mathematician and therefore not using regular english, then I agree with the 13/27 answer.
But in everyday english, from “there are two fleems, one is a glarp” we all infer that the other is not a glarp.
From “there are two fleems, one is a glarp, which is snibble” we would still infer that the other is not a glarp. Whereas from “there are two fleems, one is a glarp which is snibble” (absence of comma, or when spoken, difference in intonation) we would infer that the other is not a snibble glarp, but it could still be an unsnibble glarp.
The Tuesday is a red herring. It’s stated as a fact, thus the probability is 1. Also, it doesn’t say “only one boy is born on a Tuesday”. But indeed, this could be a language thing.
With 2 children you have the following possible combinations:
1. two girls
2. a boy and a girl
3. a girl and a boy
4. two boys
If at least 1 is a boy we only have to consider the last three combinations. That gives us one in three that both are boys.
The error which is often made is to consider 2. and 3. as a single combination.
edit
I find it completely counter-intuitive that the outcome is influenced by the day, and I simulated the problem for one million families with 2 kids. And lo and behold, the outcome is 12.99 in 27. I was wrong.
This, in my opinion, is why the intuitive approach fails:
One has a tendency to think that
the probability of 7*P(b AND d1) = P(b AND d1) + P(b AND d2) + … + P(b AND d7) = P((b AND d1) OR (b AND d2) OR … OR (b AND d7)) = P(b AND (d1 OR d2 OR … OR d7)) = P(b).
However, the flaw here is that, in reality, P(b AND d1) + P(b AND d2) + … + P(b AND d7) is NOT equal to P((b AND d1) OR (b AND d2) OR … OR (b AND d7)). This means that mentioning independent (and one might think irrelevant) information alongside with relevant information actually changes the resulting probabilities.
One interesting consequence: if I say something like
“I have two children. One of them is a boy who was born at 10:24 PM on February 10th,”
The probability that I have two boys is now almost exactly the same as as the probability that I have a girl and a boy. Adding a unique or almost unique piece of information makes the stuff I want to know about the other child independent of the information I have on the first child. If I took this to the extreme and said that I have a firstborn boy, won’t know anything additional about the other child.
The VERY simple answer is that you can get rid of all ambiguity if you can clarify from what pool the parent was chosen. This allows you to restate the “probability” question as a “percentage” question, taking idealized percentages.
Consider:
Given a parent randomly selected from the pool of all parents who have two children where at least one of the children is a boy born on Tuesday, what is the probability that both children are boys?
This can be stated as a percentage question:
What percentage of (parents who have two children where at least one of the children is a boy born on Tuesday) have two boys?
By contrast, if the day was a matter of chance (not a restriction on the pool size), we get a different question:
Given a parent randomly selected from the pool of all parents who have two children at least one of whom is a boy, where after the random selection is made we shall be told the day of the week on which the boy was born (if there is only one boy) or if there are two boys we shall be told the day of the week on which a randomly chosen one of the two boys was born, what is the probability that both children are boys if the day we are told is Tuesday?
As you can see, in this setup the “Tuesday” part has absolutely no influence on the selection process, and can be entirely disregarded.
This is what is meant by Wikipedia’s statement:
The moral of the story is that these probabilities do not just depend on the known information, but on how that information was obtained.
As a further note, you can’t “prove” anything about the “actual” meaning by using computer simulations, because in order to program a computer simulation in the first place you must first disambiguate which scenario you are actually talking about. So the only thing a computer simulation can “prove” is how the programmer interpreted the question.