Hacker News new | past | comments | ask | show | jobs | submit login
Paradoxes of Probability and Other Statistical Strangeness (quillette.com)
181 points by robocaptain on June 13, 2017 | hide | past | favorite | 88 comments



For those who might be interested, and in a slightly different vein than the examples in the article, there's the "sleeping beauty" paradox: https://en.wikipedia.org/wiki/Sleeping_Beauty_problem

Basically, an agent is put to sleep and told they will be woken up once or twice, depending on the results of a fair coin flip, without the ability to remember other awakenings.

What probability does the agent assign to the event that the coin landed heads?

The intuitive response is 1/3, but this poses obvious epistemological problems. The agent has, ostensibly, no new information at all, and their prior is surely 1/2. Hope someone else finds this as interesting as I do!


I mostly find it interesting in that people could think that the chance is 1/3 (and that it may even be obvious!). After reading the description I can understand what they are getting at, but I think the conditional probability is messed up.

Instead of P(Monday | Heads) = P(Monday | Tails) = P(Tuesday | Tails) it is really P(Monday | Heads&Awake) = P(Monday | Tails&Awake) = P(Tuesday| Tails&Awake) or something like that. But the interviewer isn't asking about that, they are asking for the probability of the coin. The 3 positions are only exhaustive given that you are awake to be interviewed about them, not exhaustive of possible states (it's missing P(Tuesday | Heads&Asleep)). Since you're always awakened at least once, I find the argument that being awake has 'given you information that it is not tuesday AND heads' is pretty weak. While true, both heads and tails expect to be awoken while it is not both tuesday AND heads.


The outcomes can be easily enumerated. If Sleeping Beauty always answers "Heads" she will be right 33% of the times she is asked. This is pretty close to the definition of a 33% chance.

She hasn't been given new information by waking up, she also knows as she goes into the experiment - "most of the outcomes where I am being interviewed involve the coin toss coming up tails".


Here is how one might decide that 1/3 is obvious. Imagine that N people simultaneous undergo the experiment, for a very large N.

Half of them end up in the heads group. They wake up on Monday and are questioned. Then they sleep until Wednesday and are released.

The other half end up in the tails group, and so are questioned twice (Monday and Tuesday) then released on Wednesday.

Because we gain no information during the experiment, we can make our decision before the experiment.

Let's count. There will be 3N/2 interviews conducted. N/2 of the will be 'heads' interviews and N will be 'tails' interviews. So going in, we can see that when someone experiences the event 'being asked about the coin', 1/3 of the time the coin will be heads and 2/3 o the time it will be tails. Hence, our credence in the coin being heads should be 1/3.

Here is a counterargument. Imagine a slightly different experiment. The people are not asked what their credence in the coin being heads is. They are asked to guess if it is heads or tails. If they are right, the experiment continues and they are eventually released. If they are wrong, this is noted, and the experiment continues until Wednesday, and then they are killed and their home planet is destroyed.

As before, we gain no information during the experiment, and so can decide our answer beforehand. No matter what strategy one picks for making that decision, there is a 50/50 chance that one ends up with a destroyed planet. That indicates that our credence in heads should be 1/2.


You will be awoken twice as many times because the coin comes up tails as you will because the coin comes up heads. If you can manipulate the formulae to tell you something different, that only means you have failed to manipulate the formulae correctly.


It's very interesting and I don't think there's an obvious correct answer. It's hard to formally model mathematically.

Here's a game-theoretic perspective. In general, when an event has a 1/3 chance of happening, an idealized gambler would be indifferent between the following two bets or lottery tickets: (A) win $2 if the event happens; (B) win $1 if the event doesn't happen. (Notice her average payoff is 2/3 no matter which bet she takes.)

Now in the sleeping beauty problem where tails is two awakenings and heads is one, a gambler would be indifferent between (A) winning $2 every time she wakes up and the coin is heads, and (B) winning $1 every time she wakes up and the coin is tails. This suggests that her "belief" is 1/3.

Another way to put it might be that for a risk-neutral agent, doubling the payoff in one state of the world is equivalent to doubling its "perceived probability". In the sleeping beauty problem, doubling the payoff is like experiencing everything twice.


By far the most unintuitive paradox for me personally is the one presented here: https://youtu.be/go3xtDdsNQM?t=3m27s

"Mr. Jones has 2 children. What is the probability he has a girl if he has a boy born on Tuesday?" Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre.


The question is ill-posed: it does not give you enough information to tell the probability. You know what Mr. Jones has told you, but you don't know under what circumstances he would have told you this.

Suppose that you ask Mr. Jones weather he has a boy and he says yes. Then the probability that he also has a girl is 2/3.

Suppose that you asked Mr. Jones weather he had a boy born on a Tuesday, and he says yes. Then the probability that he has a girl is less than 2/3, because having two boys gives (about) double the chance for one of them to have been born on a Tuesday.

However, suppose that you asked Mr. Jones weather he has a boy, and if so what day his eldest boy was born on, and he says "yes, and on Tuesday". Then the probability that he also has a girl is again exactly 2/3.

Wikipedia has a detailed explanation: https://en.wikipedia.org/wiki/Boy_or_Girl_paradox


Everything you say after your first paragraph is correct (presuming people always answer questions with "Yes" or "No" honestly), but…

No one said anything about "Mr. Jones has told you…", here. There was nothing about asking Mr. Jones a question and him providing an answer according to some process.

Rather, the question was simply "Mr. Jones has two children. What is the probability he has a girl if he has a boy born on Tuesday?".

There are implicit conventions involved in reading this, but not particularly problematic ones. This implicitly means "Out of all families with two children, at least one of which is a boy born on Tuesday, what proportion have a girl? [Presuming that out of those families, birth gender and day of the week for the two children are all independently uniformly distributed]". And this is a straightforward counting problem.

So the wording seems fine and the problem well-posed to me.


The problem really is almost one of metagaming.

People who are reading this are likely to have seen, for example, questions which read as if they're asking for a conditional probability ("John is male, 33 years old, and has a degree in English literature; what is the probability he works as a barista?") but are designed to let the questioner turn around and say "Ah-HA! I got you! It was really a question about the base rate (in this case, of baristas)!".

As posed and with knowledge of that issue, this question reads like an attempt to do the opposite: to pose a question which seems like it's asking about the base rate of boys vs. girls, but then the questioner turns around with "Ah-HA! I got you! It was really a question about the conditional probability!"

Once it's phrased in a way that makes explicit that it really is a question about conditional probability, and not an attempt to lure someone into a base-rate trap, there's no paradox.

Complicating things is that analyses usually focus on the day of the week as the crucial factor, when it's easier to get to an intuitive understanding of the probability via dealing with the day-of-week first and then focusing on the small but crucial change that comes from knowing the gender of one of the children. After accounting for day-of-week you are left with 28 equally-probable situations, with at least one girl in 14 of them, for the expected 1/2. Then the fact that you end up at a probability just over 1/2 is due to the elimination of the case in which both children are girls (since we know at least one is a boy), which pushes the final result to 13/27 in favor of the second child being a boy.


There are implicit conventions involved in reading this

Explicitly the question adds no such limits. So, abstractly someone could be asking the question without those limits.

It's like the difference between infinity and how whatever subset of math you work in defines infinity. And yes there are more than one commonly used definition.


Sure, and if the quibble was along the lines of "You never explicitly said boys and girls are 50-50 distributed! You never explicitly said elder and younger childrens' birth genders are independent! You never explicitly said birth-days-of-the-week are uniformly…", then that would be fair, if pedantic.

But this "You know what Mr. Jones has told you, but you don't know under what circumstances he would have told you this" objection is objecting to some other problem than the one posed; the problem posed had nothing to do with Mr. Jones saying anything.

I understand the reason for worrying about this, because many probability riddles ARE poorly worded or presented in such a way as that this becomes an issue, but it wasn't the case here. (Note: I haven't watched rest of the video and have no comment on it; I'm just considering the wording of this individual question within it)

There was never any claim that Mr. Jones said anything, and no one was called to infer anything from any actions taken by Mr. Jones. He could be a lifelong mute. Rather, the fact that Mr. Jones has two children was presented, by an omniscient narrator, and then a counting question was asked.

(Indeed, Mr. Jones himself is completely irrelevant to the problem asked, except as a way of framing the counting question to be about two-children families. The question asked might as well have been "What proportion of two-children families with a boy born on Tuesday have girls?". It was very slightly differently worded, but not in such a way as makes "We don't know what Mr. Jones was asked!" a relevant objection.)


I also fell into the same ambiguity trap, and I think that the objection about explicit wording is a fair one to make.

"What proportion of two-children families with a boy born on Tuesday have girls?" seems completely clear to me. I would have answered that question relatively quickly.

But the original question had me very confused. I felt a strong desire to ask more about the situation. A great deal of my intuition wanted to say that "well there is nothing special about Tuesday... Any boy that he has is going to be born on some day of the week, and if whatever day of the week that son is born on is included as this line item in the question, then that line item is irrelevant."

I wouldn't have fallen into that same trap in the case of the "What proportion of two-children families..." version because the "Any boy that he has is going to be born on some day of the week" logic doesn't apply.

Tuesday seemed like it might have been arbitrary in the original question, where it seems explicit in your rephrased version.


I mean, it's just as arbitrary in my rephrased version. I could just as well ask "What proportion of two-children families with a boy born on Monday have girls?". But, very well, the different wording prompted differing intuitions for you; so it goes.


> Rather, the fact that he had two children was presented, by an omniscient narrator.

That the narrator is omniscient doesn't change anything. The question still remains: under what circumstances would the narrator have told you, e.g., that "he has a boy born on Tuesday" vs. "he has a girl born on Tuesday". Perhaps this omniscient narrator really likes girls, in which case they would tell you about a girl if Mr. Jones had any girls. Then since they told you "Mr. Jones has a boy born on Tuesday", you know definitely that Mr. Jones has no girls.

Ignoring the source of your knowledge doesn't make that source any less important. And the standard convention you're talking about corresponds to a source of knowledge where you ask a yes/no question and get a yes, which is frequently unrealistic. This is why it disagrees with people's intuition, and this problem is called a paradox.


As a probability problem with the standard assumptions, it's a well defined question. If you saw this in Bertsekas or Sheldon Ross, the sampling would be clear.

And I also think you're incorrect about why it's a paradox. People are just bad at understanding and estimating things in conditional probabilities. Further, the answer changes based on the sampling regime, which (as mentioned) was not explicitly stated but is clear to almost any student that's taken a discrete probability class.


> And I also think you're incorrect about why it's a paradox. People are just bad at understanding and estimating things in conditional probabilities.

This is a testable prediction. I predict that making the source of your knowledge explicit eliminates the paradox.

To me, it feels strange that "the probability that Mr. Jones has a girl given that he has a boy born on Tuesday" is ~1/2. However, it feels normal that "You ask Mr. Jones weather he has a boy born on Tuesday, and he says yes. What is the probability that he has a girl?" is ~1/2.

Do other people agree?


It's not that the probability is close to 1/2 that makes it paradoxical for most people. It's that the probability differs from 1/2 at all. As in the OP of this very thread saying "Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre."


It's the fact that it differs from 2/3. If the day of the week was not mentioned the (conventional) answer is exactly 2/3. Not 1/2.


Yes, that's also "paradoxical", though probably not the paradox that would trip people up first unless they'd seen the other problem first. But, you're right that I may have misread which departure from expected answer was bugging the OP. Nonetheless, everything else I stated still holds.


Following classical probability arguments, we consider a large urn containing two children.

:-O


Your problem is that you are thinking there's a "the boy". But there's not a "the boy". Mr. Jones could have two boys. He could have two boys both born on Tuesday, even. The term "the boy" does not denote any particular boy, in that case, and causes you to think about the situation erroneously.

If the question were "There's Kid 1 and Kid 2, each independently selected with random gender and birth-day-of-the-week. Out of those cases where Kid 1 is a boy born on Tuesday, what proportion are cases where Kid 2 is a girl?", then the answer would indeed be a straightforward 50%; the status of Kid 1 is entirely independent of the status of Kid 2.

But that's not the question. The question is "There's Kid 1 and Kid 2, each independently selected with random gender and birth-day-of-the-week. Out of those cases where at least one (either one, and possibly both) of Kid 1 and Kid 2 is a boy born on Tuesday, what proportion are cases where at least one of Kid 1 and Kid 2 is a girl?".

This is very different, and of course just drawing out the possibilities (all 2 * 7 * 2 * 7 equiprobable-by-stipulation choices of gender and birth-day-of-the-week for Kid 1 and Kid 2) and circling which pairs of subsets are the relevant ones for the two questions reveals the difference, the probabilities for either question elementarily calculable in this way by basic counting.


The 14/27 answer in the video is correct, incidentally.

Also, I notice you said "Somehow knowing the day of the week the boy was born changes the result. It's completely bizarre."

Remember, though, there's no "the boy". The question "On which day of the week was the boy born? Tell me, I need to know!" does not always have a well-defined answer.

Indeed, you'd get the same 14/27 answer even if "Tuesday" in the question "What proportion of two-children families with at least one Tuesday boy have a girl?" was replaced by any other day. And if this seems paradoxically in conflict with the fact that simply asking "What proportion of two-children families with at least one boy have a girl?" has instead the answer 2/3, reflect again upon the fact that some families have two boys born on different days, so that there's no single answer to "On what day was 'the boy' born?". And then just draw out the cases and count.

(Specifically, out of the 2 * 7 * 2 * 7 equiprobable cases overall for Kid 1 and Kid 2's genders and days, there are 27 cases where there's at least one Tuesday boy, and 14 cases where there's at least one Tuesday boy and also a girl. There are 3 * 7^2 cases where there's at least one boy, and 2 * 7^2 cases where there's at least one boy and also a girl.)

Many of these questions, I think, become clearer if thought of as counting questions instead of as "probability" questions (though it's all the same; the math called "probability" is just the math of various kinds of counting (from simple counting as in this case to complexly weighted continuous measurements, but still ultimately a generalized form of counting). However, despite that equivalence, the concept "probability" has developed all these other distracting connotations, such that psychologically, there can be a useful difference in perspective in switch to explicitly thinking "counting" instead. No one would long dispute that there are 27 cases with at least one Tuesday boy, etc.).


Agreed. He removed the second B2B2 probability annotation as though it were a repeat of the first and inapplicable to the probability set, but that's not the case, and it shouldn't be removed. Apply lower-case to the younger boy in the probability sets and it's clear why. B2b2 is not the same occurrence as b2B2. Even though the day both were born on was "a Tuesday" doesn't mean both probability instances are referring to the exact same event. Except in the case of twins, which is outside the scope of the exercise.


The comments to this video actually say (with proof) that this was an error in the video.


There seems to be quite a bit of debate about it in the comments and I'm not sure who to believe. At one point someone coded a simulation to test it and the results were as predicted by the video. Even if the video is incorrect, the fact it's so confusing still makes it an interesting paradox.


I'm an idiot, but I'm going to throw my hat in the ring here:

The video is wrong. The problem reads: Jones has 2 kids. What is P(he has a girl) given that he has a boy born on a Tuesday. Consider, for a moment, what information we're getting from "boy born on a Tuesday." This is no different than "boy with red hair," or "boy with 5 freckles." The fact that the BOY was born on a tuesday does not change P(day of the week girl was born). Imagine the "boy with 5 freckles" case - let 5 freckles be denoted by F5, six freckles by F6 and so on... would the appropriate calculation include enumerating P(boy F5, boy Fn) for all n? No.

The "born on Tuesday" is irrelevant. Thus you have the following scenarios: - one kid is TuesdayBoy and the other is also a boy, born at any time - one kid is TuesdayBoy and the other is a girl, born at any time

Out of these options P(Jones has a girl) is a flat out 50%. There is no need to bring in concepts of "which was born first" or enumerate all possible days of the week each child could have been born.

Ok... now all the real smartypants here can correct me :)


There are 2 * 7 * 2 * 7 ways to assign gender and birth-day-of-week to two children. By convention, all are considered equiprobable (this is the same as assuming kids' genders and birth day-of-weeks are independent of each other and of all facts about other kids, and that both genders are equally likely and all 7 days are equally likely for any given kid.)

Of these possibilities, 27 are situations where one kid is a Tuesday boy. [Do you dispute this count?]

Of those, 14 are situations where one kid is a girl. [Do you dispute this count?]

The answer to "What proportion of cases where there is at least one Tuesday boy also have a girl?" is thus 14/27.

You have stated by fiat that certain things are irrelevant to certain other things, that certain things have probability 50%, etc, but in doing so, you have not considered the count correctly. You are likely misled by phrasing such as "the boy", when there are families with two boys in which there is no proper referent of "the boy" and no particular answer to question like "Which day was 'the boy' born?".


ok ok... let me try to get this straight. Just as kind of a mental process for trying to understand whether or not something passes the smell test, I typically try to take the basic premise and turn it up to 11 and see if that still makes sense.

In this problem, as you've described it, we're enumerating "ways to assign gender and birth-day-of-week." We can do this because there are a countable number of "days of the week" (so we can map to the integers: 1-6) AND there is also a surjective function of [child] -> [day of the week they were born]. Am I right so far?

Now let's replace the set [1-6] with another countable set that also maintains the surjective function. We could say "day in the lunar cycle" (so ~27 options), or better "day of the year" (366 options), for example. Do we now need to consider the 23662366 ways to assign gender and birth-day-of-the-year? Take it further with whatever you want: "birth weight in milligrams" or "number of freckles" (as I previously suggested). All countable things that meet the surjective requirement.

This is starting to smell funny, right? So let's take a look at the math.

You say there are 2727 ways to configure day+gender, assuming independence for kid 1 (k1) and kid 2 (k2). This represents: (k1 gender options * k1 day of week options) * (k2 gender options * k2 day of week options). Right? I'm with you so far. Then you say "Of these possibilities, 27 are situations where one kid is a Tuesday boy." Hold up.

We are given two pieces of information: that one of the kids is a boy, and that particular boy was born on a Tuesday. Let's say the boy is k1 (this is an assignment of enumeration, not of "who came first;" just like Sunday = 1 does not mean that any kid born on a Sunday was born before every kid born on Monday = 2). So now the k1 options are [11] (boy, tuesday), and the total number of options are: [11] * [27] = 14. Of those 14, 7 are girl options. And we're back to a straight 50%.

So yes, I dispute the 27 number. It seems like it is arrived at by 2127, minus one for an apparent duplicate. But the 212*7 represents maintaining gender non-specificity for Tuesday boy, which should be incorrect, no?

> You have stated by fiat that certain things are irrelevant to certain other things...

Yes, but that's what "independent" means, right? You also stated that you're assuming these two things are independent, hence equiprobability. But independence is defined by P(A) = P(A|B). The probability of A is completely unaffected by B. Yet the outcome you arrive at is that P(A) IS affected by B, so the math presented is internally inconsistent.

What am I missing here? I'm fascinated by the uncertainty around this little problem.


Let's see if I can help you understand this a bit better. First, let's clarify the problem being asked. There are 2 different problems with different solutions and it helps to explicitly separate them.

problem 1) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy born on Tuesday. They say yes. What is the probability that they have a girl?

problem 2) You go up to a person and ask them if they have exactly 2 children, at least one of which is a boy. They say yes. You then ask them which day of the week a boy they have was born on. They say Tuesday. What is the probability that they have a girl?

The original problem that was posed is equivalent to problem 1, but not equivalent to problem 2. This could be what is confusing you, because in problem 2 the extra information plays no role in the selection process, while it does play a role in problem 1. In problem 2, the answer is the standard 2/3. Why are the probabilities different between problem 1 and 2? Here's why:

Think about the set of people who could answer yes to the question in problem 2. The ratio of these groups is important. A parent with BB (two boys) is equally likely to answer yes to problem 2 (100% likely to be exact) as a parent with BG and GB (also 100% likely to answer yes), which leads to the correct solution of 2/3. However, in problem 1 a parent with BB is NOT EQUALLY LIKELY to answer yes as a parent with BG. This is because we added an extra qualifier (must be born on Tuesday). The parent with BB has two chances to meet this qualifier because they have two boys, so the parent with BB is actually more likely to answer yes to the question than the parent with BG. As the qualifier becomes more and more rare (day of lunar cycle), the probability of the BB parent answer yes P(yes|BB) approaches twice the value of P(yes|BG). So now you're left with some subset of parents with BB, BG, and GB, but in this scenario you've sampled from BB approximately twice as much as you've sampled from each of the BG and GB groups, leaving you with approximately the same number of people from group BB as the combined amount from groups BG and GB. This is why the probability approaches 50%

I spend a while writing this, so hopefully it helps!


:) Thanks for taking the time. I've realized a few things, and found it helped to get a bit more formal.

Jones has 2 kids. Let A be "he has a girl" and B be "he has a boy born on tuesday." First thing I realized is A and B are NOT independent - this is key. P(A) includes the option of Jones having two girls. But if B is true, then the two girls option isn't on the table anymore, which affects P(A). Realizing this helped me start to better understand what kind of problem we're dealing with.

Second was realizing that P(A&B) is not at all the same thing as P(A|B) - the probability of A given B - when A and B aren't independent. The problem is asking for P(A|B), and by the rule of conditional probability: P(A|B) = P(A&B)/P(B)

P(B) can be solved for without too much fuss: solve 1-P(!B). For each kid you have 2 genders and 7 days of the week, or 2 * 7 = 14 options. 13 of those are not "Boy & Tuesday." So you have P(!B) is (13/14) * (13/14) = 169/196. P(B) = 1 - 169/196 = 27/196.

This leaves us trying to figure out P(A&B). I can't think of any other way to do it other than enumerating all options. We can take a shortcut and just look at all 27 possible scenarios where B is true. This seems to be the method of choice ;) As others have shown, we see that 14 of those satisfy A. So P(A&B) = 14/196.

Now, we can solve: P(A|B) = P(A&B)/P(B) = (14/196)/(27/196) = 14/27

So I'm now part of the "math checks out" club. Thanks for all the help people!


You can't assume "the boy is k1". The original 2 * 7 * 2 * 7 cases were indeed all equiprobable different cases. And we're not given that K1 is a boy. We're given that at least one child is a boy.

If K1 is a girl and K2 is a boy born on Tuesday, this still counts as the family (Mr. Jones, if you like) having a boy born on Tuesday. There are 27 cases that count as the family having a boy born on Tuesday, all equiprobable. And out of those, 14 also count as the family having a girl.

As for your noting that we can split these cases even more finely, so that there's no distinguished end-all, be-all partitioning of cases, sure, you can do that. What I'm really saying is this:

1/2 of two-child families have their elder child being a boy. 1/2 of two-child families have their elder child being a girl. [On conventional idealizations for these problems. You surely do not dispute this, yes? You may not care about this number, but you don't dispute it, right?]

In each of those subgroups, 1/7 of families have their elder child born on Sunday, 1/7 have their elder child born on Monday, etc. [Do you dispute this?]

In each of THOSE subgroups, 1/2 of families have their younger child a boy, and 1/2 have their younger child a girl. [Any dispute?]

And in each of THOSE subgroups, 1/7 of families have their younger child born on Sunday, 1/7 of families have their younger child born on Monday, etc. [Any dispute?]

And some amount of those have low birth weight, some have high birthweight, some have 5 freckles, etc., but we needn't figure out those numbers.

So now I've carved the world up into 2 * 7 * 2 * 7 groups, based on gender and birth-date-of-week for older and younger child. We can carve the world up into groups in different ways also, more finely or more coarsely or just differently. But making the four conventional assumptions we just made, the 2 * 7 * 2 * 7 grouping based on gender and birth-date-of-week for older and younger child is such that each particular such group takes up 1/2 * 1/7 * 1/2 * 1/7 of all families; these are all equifrequent groups.

And that having been done, we find that in 27 of these groups, there is at least one boy born on a Tuesday. In 13, the elder child is a boy born on Tuesday but not the younger child; in 13, the younger child is a boy born on Tuesday but not the elder child; in 1, both children are boys born on Tuesday.

But the question was not intended to be about a specific boy. The question was intended to be "Out of families that have a boy born on Tuesday (meaning at least one boy born on Tuesday), what proportion have a girl?". Any family with at least one boy born on Tuesday counts as having "a boy born on Tuesday", and even families with two boys born on Tuesday count, with no particular of their two boys given any distinguished status.

Perhaps you read the question differently; that, then, is a problem with the phrasing of the question for communicating to you its intent. But when it was asked "What is the probability Mr. Jones has a girl, given that he has a boy born on Tuesday?", what the author indeed intended this to mean, and would be generally taken in the conventional language of probability to mean, was "Out of families that have at least one boy born on Tuesday, what proportion have a girl?".

And we find that, out of the 27 equally sized groups of families that have at least one boy born on Tuesday, 14 of them have a girl, so that the answer to this question becomes 14/27.


Frankly, though, I'd prefer no one ever used the "conventional language of probability", because it leads to precisely these miscommunications.

If the question had been phrased "Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl? [on natural assumptions about lack of biases or correlations concerning the distribution of children's genders and days]", would you agree that the answer was 14/27?

That was the question the author intended to ask. The dispute may simply be as to whether the question which the author did ask is equivalent to the above; if that is indeed our only disagreement, we can still investigate that dispute further, if you like. But let's first see if the dispute is linguistic or mathematical.


> Out of two-children families that have at least one boy born on Tuesday, what proportion have a girl?

OH YES.

I finally figured it out (see the other comment). Thanks for all the explaining, but this statement right here was the best.


First step back and consider the possibilities given no knowledge whatsoever:

For each child the problem constrains to one of two possible sexes and one of seven possible days of birth.

2 * 7 = 14 possible sex/day combinations for a single child.

(2 * 7) * (2 * 7) = 196 possible sex/day combinations for a pairing of two children. To see why, you could write a program to enumerate all of them, starting with the pairing "Boy/Monday + Boy/Monday", then "Boy/Monday + Boy/Tuesday" and so on until you exhaust all possible options at "Girl/Sunday + Girl/Sunday". You'll see there are 196 options.

Now start applying the facts given to us: one of the children is born on a Tuesday (eliminate all possibilities which don't have at least one Tuesday child), and that child is a boy (eliminate all possibilities in which there is not a Tuesday child who is also a boy).

This leaves exactly 27 possible cases:

Boy/Sunday + Boy/Tuesday,

Boy/Monday + Boy/Tuesday,

Boy/Tuesday + Boy/Tuesday,

Boy/Wednesday + Boy/Tuesday,

Boy/Thursday + Boy/Tuesday,

Boy/Friday + Boy/Tuesday,

Boy/Saturday + Boy/Tuesday,

Girl/Sunday + Boy/Tuesday,

Girl/Monday + Boy/Tuesday,

Girl/Tuesday + Boy/Tuesday,

Girl/Wednesday + Boy/Tuesday,

Girl/Thursday + Boy/Tuesday,

Girl/Friday + Boy/Tuesday,

Girl/Saturday + Boy/Tuesday,

Boy/Tuesday + Boy/Sunday,

Boy/Tuesday + Boy/Monday,

Boy/Tuesday + Boy/Wednesday,

Boy/Tuesday + Boy/Thursday,

Boy/Tuesday + Boy/Friday,

Boy/Tuesday + Boy/Saturday,

Boy/Tuesday + Girl/Sunday,

Boy/Tuesday + Girl/Monday,

Boy/Tuesday + Girl/Tuesday,

Boy/Tuesday + Girl/Wednesday,

Boy/Tuesday + Girl/Thursday,

Boy/Tuesday + Girl/Friday,

Boy/Tuesday + Girl/Saturday

If you count, you'll see that of those 27, there are 13 with two boys and 14 with a boy and a girl. The probability of two boys, given that one child is a boy born on Tuesday, is thus 13/27.


I adamantly agreed with you. Then I made a simple spreadsheet that proves us wrong: http://cl.ly/kuQE


Ah, but see... you're counting (B2,B2) as one item because "order doesn't matter", but then counting (G2,B2) and (B2,G2) independently. If (G2,B2) is different than (B2,G2), then (B2,B'2) is distinct from (B'2,B2).


Think of the x-axis as the first child and the y-axis as the second child. One in fourteen chance of choosing a column and one in fourteen chance to choose a row. I fail to see how there could be any additional outcomes or that any square has a greater chance of occurring than another..


It's fairly simple.

If you flip 2 coins then say whatever the first coin was the the odds if you said H was HH, or HT and if you said T it would be TH, TT. However, if you flip two coins and then say if you got at least one head independently from whatever you flipped then the odds you have 3 options HT, HH, TH with equal odds.

So, the question is if the full statement was based on the data or only the truth value of the statement is based on the data.

PS: Now assuming it's truth value is based on data. if you look at all options there are 14 gender day combinations per kid and 14 * 14 = 196 gender day combinations in totoal. Only 14 of of those 196 start BT which is then split evenly 7 BTB_, 7 BTG_. However that leaves 196 - 14 other options to consider. 7 * 14 of them Start G, and 6 * 14 of them start with B not on a Tuesday, but out of those you only keep 1/14 as you need BT on the second roll. Now add them up 13B and 14G out of (13 + 14) = 27. Or 13/27 B, and 14/27G.


Another "paradox": even though it's possible to randomly pick a rational number from the reals, the probability of this happening is 0.


I find that result fairly intuitive, when you understand how measure theory came up to be.

A much more surprising result is that most irrational are normal numbers, but we know almost no normal number (morally speaking, a normal number is an irrational number where each digit is equiprobable in any base).


Define "almost no". I mean, they have full measure and I can give countably many explicit examples, what more could you expect?


I think all mathematicians, given enough time, will eventually say countable to mean infinite.


This is virtually an axiom for continuous distributions.

One of the axioms of probability is that if you have an event (i.e. a set), then the probability of a countable union of disjoint sets is the sum of the probability of each set (event) occurring.

Assume a uniform distribution between 0 and 1. Now consider point sets of the rationals (i.e. the number 0.5 is represented by a set with just 0.5 in it). Since the distribution is uniform, each set has the same probability (i.e. the likelihood of picking a random rational).

Now consider this question: What is the probability of picking any rational between 0 and 1? Well, that's just the sum of the probabilities over all rationals (because it is a countable sum of disjoint sets). If the probability of picking any particular rational was non-zero, this sum would be infinite, which violates the laws of probability.

Thus, by convention, it's just simpler to define it to be 0.

There's no magic here. These properties were picked merely to make analysis with measure theory clean. Don't try to ascribe any real world meaning to picking a point.


I think this only sounds like a paradox if it is phrased poorly. The accurate way to state it is "The probability of randomly picking a specific number is 0" and that sounds reasonable. The probability of successfully picking any number is 1.


That's a different statement: OP is alluding to the fact the measure of Q is 0 when using the "standard" sigma algebra on the real line, while you are saying that the measure of a number of 0.

[edit] strictly speaking, you would restrict yourself to a bounded interval, e.g. if you pick a random number from a uniform distribution on [0, 1], the probability that this number is rational is 0.


oh, yeah, but that's because although Q is dense, it is not a dense subset of R and locally that's equivalent to saying a single point is not dense in R


no, it is not. The OP statement is about the probability of the event "{X in Q}" (equal to 0), with X a random variable uniformly distributed on a bounded interval. That even contains many points (infinitely many actually), but has a probability 0.

You are talking about the probability of a single point event, which is also always 0 on that same sigma algebra.

The OP point is not completely trivial because the event contains an infinite (but countably) number of elements. It is fairly easy to understand though since by its very definition, the P[{X in Q}] = sum P[{x}] taken over every rational number (since Q is countable), and each P[{x}] is 0.

A deeper statement is that there exists uncountable sets of probability 0.


The paradox is that, after picking a random number, you have just done a thing which has probability zero. Doing a thing that has zero probability shouldn't be possible. Ever.


You can't pick a random real number between 0 and 1. Heck, almost all reals between 0 and 1 can't ever be constructed let alone picked.

The here is the non-constructive nature of the real numbers. That is not to say the reals are useless, but they are not much more than a formalism. It's rather useful though because it's hard to get numbers like pi or e. Its really nice that any real interval is compact, but that too is hard to replicate.


You can certainly pick a random real number from the unit interval.


Really? Go on, then, pick one and tell us what it is (or at least tell us what your procedure was).


One example in a finite space and time setting would be selecting a random point on the ground. Say by dropping a ball there or something. The exact coordinates it lands is a random real number. But the probability that it landed on those exact coordinates is exactly 0, hence a paradox.


Non-countable sets defy intuition on several ways. The silver lining is that we don't have any evidence a non-countable thing exist on the real world.

I don't think anybody even has a procedure for gathering that kind of evidence.


No. Rational number are countable, but have the same property of zero probability for an item in a uniform distribution.


Why the downvotes? I think he is correct; and if someone think he is not, please elaborate


Yes, no reason for the downvotes.

Infinities are problematic too.


It is possible! Actually I remember​ pointing out a similar concern in my probability class back in the day. The teacher's answer: that is precisely the difference between probablity and possibility :)


I would argue that the distribution you used to pick a random number was not uniform. Not all the real number were equaliy likely to be picked by you. Hence the probability for some numbers was > 0.


Is it probability actually zero, or just infinitely close to zero?


If it's a random real then the probability is zero. If it's an arbitrarily close approximation to a random real then the probability is arbitrarily close to zero.


How can the sum of infinitely many zero probabilities be 1? I can understand how the sum of infinitely many infinitely close to zero values can be 1, but not infinitely many exactly zero values


There's no such thing as a "sum of infinitely many" anything. What we are talking about is the limit of an infinite series, which behaves nothing at all like a sum.


Those are the same thing. You're just calling them by different names.


Please describe how it is possible to pick such a number. For example, I can readily imagine how to pick a random 32b float, but that it is an entirely problem with a nonzero probability.


In probability theory, when dealing with continuous sample spaces / random variables, events with probability 0 still have a chance of occurring, and events with probability 1 stil l have a chance of NOT occurring, see:

https://en.wikipedia.org/wiki/Almost_surely

This strange property comes from strange properties of the real numbers (and uncountably infinite sets) that give rise to things like:

https://en.wikipedia.org/wiki/Banach%E2%80%93Tarski_paradox

Measure theory deals with resolving this:

https://en.wikipedia.org/wiki/Measure_(mathematics)


What's an anagram for Banach Tarski?

Banach Tarski Banach Tarski.

Seriously though, you can do math without invoking the axiom of choice. The formulation of probability doesn't strictly depend on it.


Usually in math we assume the axiom of choice :) https://en.wikipedia.org/wiki/Axiom_of_choice I'm assuming this could somehow lead to such a "random" pick in the technical sense.

In terms of implementation, I'm not aware of an algorithm that can randomly pick a real number on an actual computer. Perhaps a mathematician could show how to pick one on some abstract machine with infinite resources, and not constrained by finite bit representations of numbers.


> In terms of implementation, I'm not aware of an algorithm that can randomly pick a real number on an actual computer

An actual (finite in time and space) computer can't even represent arbitrary real numbers, much less randomly choose them.


A Turing machine can't pick random numbers of any kind.

Once you accept that you have an entropy source in the physical world, you can easily be injecting random real numbers (from some range) and in fact, usually are, which are then being binned into integers by ADCs.


What about PI? We can represent it in terms of "we know what we are talking about" and we can distinguish it from other numbers.


You can only have countable number of first or second order logic statements each defining a specific real number.


Those things sound equivalent, but I don't want to be the one to try to prove it.


Run a random number generator (0-9) for each decimal position after the dot in parallel. This should do a trick.


*entirely different


there are a number of similar paradoxes that arise when considering infinities of different sizes!

Infinity Paradoxes - Numberphile - https://www.youtube.com/watch?v=dDl7g_2x74Q


There is no uniform probability distribution on the reals.

Perhaps you meant the interval from 0 to 1?


There are many ways to pick randomly from the rationals with a non-uniform distribution.


My favorite statistical/probability paradox has always been the birthday paradox.


I don't know if Monty hall problem counts as a paradox, but that is quite high on my favourite list of counterintuitive probability results.


In my experience the only reason the Monty Hall problem comes off as paradoxical is because it is usually poorly explained.


There is a 98.75% chance of some match of birthdays of the users who upvoted this post (57 at this moment)


For me it's Simpson's paradox: it throws everyone off -- it's caused (and will continue to cause) real-world damage, it's everywhere once you see it -- it's in how newspapers report science, it's in our social policy and how we talk about social issues, it's in court cases --, and finally, it's really hard to explain to a non-math person; so even when it's happening, you sound like the irrational one for pointing it out.

... and don't get cocky once you know about it, because it's so pernicious it'll get you too if you're not careful!


I do not think this is a particularly difficult concept to explain to anyone. A typical example (not difficult to find) and a simple graph are usually good enough for most people.



"Paradox" is a pretty strong term. The items presented are more in the category of common errors and counter-intuitiveness.



Fine, but I was left feeling blah by the "paradoxes" presented.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: