• JeffJo
    130
    Except that you cannot, and you know that you cannot.Srap Tasmaner
    What you "cannot" do is assign probabilities to the cases. You can still - and my point is "must" - treat them as random variables, which have unknown probability distributions.
  • JeffJo
    130
    I'll get to a more qualitative solution to the problem at the end of this post. I hope it helps.

    This makes no sense to me. Initial distribution of what? If these are pairs of envelopes from which will be chosen the pair that the player confronts, then not only is this sample space unknown to the player, she never interacts with it. She will face the pair chosen and no other.Srap Tasmaner
    Define "interact."

    Say I reach into both of my pockets, pull out a coin in each hand, but show you just one. It's a quarter, worth $0.25. (According to the US mint, their current mint set includes coins worth $0.01, $0.05, $0.10, $0.25, $0.50, and $1.00.) I'll give you either the coin you see, or the one you don't. What do you choose?

    Notice I didn't say that I ever carry half-dollar or dollar coins. They are in the current mint set, but most Americans are unfamiliar with them so they won't carry them unless they were just given out as change.

    In what way do you not "interact with" with the distribution of coins I carry? If I keep nothing but pennies in one pocket, does that fact not "interact with" what you will get if you take the hidden coin? Even if you do not know the distribution?

    How about if I received $9.50 in dollar and half-dollar coins as change for a $10 bill from a vending machine (about the only way I get them), and put them in a different pocket so I can avoid carrying them tomorrow? Does that fact not "interact with" what you will get if you take the hidden coin? Even if you do not know the distribution?

    In both the TEP, and this example, the distribution is unknown. You can't "interact" with any specific realization of a distribution, but it most definitely "interacts" with what will happen if you choose the hidden envelope or coin. That's why the answer to OP is:

    • If you don't open your envelope, your information about the two is equivalent: you know nothing about either. So your expectation for either has to be the same. You have no idea what "the same" might mean, but you can conclude that there is no difference.
    • If you open yours, you do have information about both. But it takes different form for each. You know the value of your envelope, and the two possibilities for the other.
    • You need to know how these two different kinds of information interact with each other in order to use them. Specifically, the 50:50 split applies only before you looked. After you look, the split depends on the prior chances of two specific combinations.
  • JeffJo
    130
    Yes, it is tempting to say that if the game is only being played once then the shape of the initial distribution(*) isn't relevant to the definition of the player's known problem space. That's a fair point, and I may not have been sufficiently sensitive to it.Pierre-Normand

    No, it isn't fair to say that. No more than saying that the probability of heads is different for a single flip of a random coin, than for the flips of 100 random coins
  • Pierre-Normand
    2.4k
    No, it isn't fair to say that. No more than saying that the probability of heads is different for a single flip of a random coin, than for the flips of 100 random coinsJeffJo

    Well, that's missing the point ;-) My purpose was to highlight a point that @Srap Tasmaner, yourself and I now seem to be agreeing on. Suppose, again, that a die throwing game will be played only once (i.e., there will be only one throw) with a die chosen at random between two oppositely biased dice as earlier described. The point was that, in the special case where the die that has been picked at random is biased towards 6, and hence the "initial distribution" is likewise biased, this fact is irrelevant to the "problem space" that the player is entitled to rely on. The initial coin picking occurred within the black box of the player's ignorance, as it were, such that the whole situation can still be treated (by her) as the single throw of a fair die.

    This is what I took @Srap Tasmaner to mean when he said that "the player" doesn't "interact" with the "sample space", which is a statement that I glossed as the player's "problem space" (maybe not the best expression) not being (fully) constrained by the "initial distribution" of envelope pairs, which must be treated as being unknown but, as you've pointed out repeatedly, still belongs to a range that is subject to definite mathematical constraints that can't be ignored.
  • Jeremiah
    1.5k
    The probabilist reasons from a known population to the outcome of a single experiment, the sample. In contrast, the statistician utilizes the theory of probability to calculate the probability of an observed sample and to infer from this the characteristics of an unknown population.

    Mathematical Statistics with Applications, Wackerly, Mendenhall, Scheaffer

    This is an important separation to keep in mind, as the conceptual difference between these approaches to probability is not the number of observations, I to fell into that misleading line of thought myself, but the real difference is in the direction in which they move. One moves from a known population to the sample, while the other moves from the sample to the unknown population. To take a "probabilist" approach to the unknown population is to go in ill equipped to do that job. I think a few of the people here have little experience in dealing with unknown populations and in that naiveness they have moved in the wrong direction.
  • Janus
    16.5k


    Thanks for this explanation Pierre. It's probably a little outside my are of philosophical interest, though, which would be due to lack of time as much as anything else.
  • Janus
    16.5k


    That seems to be as good a reason as any...
  • Janus
    16.5k


    I do find it hard to think of logic as a form of probability. But perspectives do rely upon backgrounds, and no doubt I lack the appropriate background.

    Are you saying that the problem in the Op highlights a general question as to what inferences, or rather perhaps, what kinds of inferences, we should accept?
  • Jeremiah
    1.5k
    The thing to recognized here is that we have a nested structure. There are three chance mechanisms, with each housed in the last.

    Let ( j is which chance mechanisms we are at) denote the observed value of the random variable . Then , which is the sum of probabilities for the sample points that are assigned to .

    Sorry if my nested notation is a bit off, it has been awhile and I don't feel like looking it up, but basically we need another index to tell which chance mechanisms we are at, in our case I am using j.

    Since is a random variable then it is a response variable of some unknown real-world function in which the unknown domain is the sample space. The domain would actually spreads between two unknowns. I know zero has often been the assumed min, but we actually have no idea what that is; as far as we know 100 bucks could be the lowest value.


    Fist chance mechanism.

    By whatever function of an will be selected, we'll call this value .

    Second chance mechanism

    We have two envelopes, lets call one and the second . We need to define and at this point. So using a simple if then statement let's define them as: If then or if then .

    At this level of the nested structure is put into one envelope while is put into the other envelope.

    The possible assignments are:






    So the two possible cases are:




    Then , with and , meaning and .

    Third chance mechanism

    The third chance mechanism is the subjective one, as it up to us to decide by what means what we want to do and I think this is the real trick to this so-called "paradox". It says, "What should you do?", and that point is really where the conflicts arises.

    I still maintain that our gain/loss is: . However, I set this up, because I think it is important people understand where and how probabilities are being applied.
  • Srap Tasmaner
    5k
    @Pierre-Normand, @JeffJo

    What is the source of the paradox in your view?Andrew M

    I believe there is not a paradox here but a fallacy.

    Outside of being told by reliable authority "You were successful!" you need to know two things to know whether you have been successful:

    • what the criterion of success is; and
    • whether you have met that criterion.

    That these are different, and thus that your uncertainty about one is not the same as your uncertainty about the other -- although both contribute to your overall uncertainty that you were successful -- can be readily seen in cases where probabilities attach to each and they differ.

    Here are two versions of a game with cards. In both, I will have in my hand a Jack and a Queen, and there will be two Jacks and a Queen on the table. You win if you pick the same card as I do.

    Game 1: the cards on the table are face down. I select a card from my hand and show it to you. If I show you the Jack, then your chances of winning are 2 in 3.

    Game 2: the cards on the table are face up. I select a card from my hand but don't show it to you. You select a card. If you select a Jack, your chances of winning are 1 in 2.

    In both cases, winning would be both of us selecting a Jack, but the odds of my choosing a Jack and your choosing are different. In game 1, you know the criterion of success, but until you know what you picked, you don't know whether you met it; in game 2, you know what you picked meets the criterion "Jack", but you don't know whether the winning criterion is "Jack" or "Queen".

    (If you buy your kid a pack of Pokemon cards, before he rips the pack open neither of you know whether he got anything "good". If he opens it and shows you what he got, he'll know whether he got anything good but you still won't until he explains it to you at length.)


    Let's define success as picking the larger-valued envelope. There is a fixed amount of money distributed between the two envelopes, so half that amount is the cutoff. Greater than that is success. One envelope has less than half and one envelope has more, so your chances of meeting that criterion, though it's value is unknown to you, are 1 in 2. After you've chosen but before opening the envelope, you could reasonably judge your chances to be 1 in 2.

    You open your envelope to discover 10. Were you successful? Observing 10 is consistent with two possibilities: an average value of 7.5 and an average value of 15. 10 meets the criterion "> 7.5", but you don't know whether that's the criterion.

    What are the chances that "> 7.5" is the criterion for success? Here is one answer:

    We know that our chance of success was 1 in 2. Since we still don't know whether we were successful, our chance of success must still be 1 in 2. Therefore the chance that "> 7.5" is the criterion of success must be 1 in 2.

    This is the fallacy. You reason from the fact that, given the criterion of success, you would have a 1 in 2 chance of picking the envelope that meets that criterion, to a 1 to 2 chance that the unknown criterion of success is the one your chosen envelope meets.

    (No doubt the temptation arises because any given value is consistent with exactly two possible situations, and you are given a choice between two envelopes.)

    You can criticize the conclusion that, for any value you find in your envelope, the two situations consistent with that value must be equally likely, but my criticism is of the inference.

    Now since we do not know anything at all about how the amounts in the envelopes were determined, we're not in a position to say something like "Oh, no, the odds are actually 2 in 7 that '> 7.5' is the criterion." So I contend the right thing to say now is "I don't know whether I was successful" and not attach a probability to your answer at all. "I don't know" is not the same as "There is a 50% chance of each."

    You can reason further that one of the two possible criteria, "> 7.5" and "> 15", must be the situation you are in, and the other the situation you are not in. Then you can look at each case separately and conclude that since the value in the unopened envelope is the same as it's always been, your choice to stick or switch is the same choice you faced at the beginning of the game.

    If you switch, you will turn out to be in the lucky-unlucky or the unlucky-lucky track. If you don't, you will turn out to be in the lucky-lucky or the unlucky-unlucky track.
  • Pierre-Normand
    2.4k
    (...) This is the fallacy. You reason from the fact that, given the criterion of success, you would have a 1 in 2 chance of picking the envelope that meets that criterion, to a 1 to 2 chance that the unknown criterion of success is the one your chosen envelope meets. (...)Srap Tasmaner

    This is a very neat analysis and I quite agree. One way to solve a paradox, of course, is to disclose the hidden fallacy in the argument that yields one of the two inconsistent conclusions. Your explanation indeed shows that, in the case where the criterion of success is unknown, and we don't have any reason to judge that the two possible success criteria are equiprobable, then it is fallacious to infer that the expected value of switching is 1.25v (where v is the value observed in the first envelope).

    This is consistent with what I (and some others) have been arguing although I have also highlighted another source of the paradox whereby the equiprobability assumption (regarding the two possible albeit unknown success criteria) would appear to be warranted, without any reliance on the fallacy that you have diagnosed, and that is what occurs in the case where the prior distribution of possible envelope pairs which the player is entitled to judge to be possible is assumed to be uniform and unbounded. This is the ideal case, physically impossible to realize in the real world, that I have illustrated by means of my Hilbert Rational Hotel analogy.
  • Srap Tasmaner
    5k

    Yes, I believe it is entirely consistent with criticizing the conclusion of the faulty inference. I think we would like to believe that invalid inferences can always be shown to have undesired consequences, but that also requires agreement on (a) the further inference, and (b) the undesirability of the consequence. I suggested at one point in this thread that if told the value of the other envelope instead of your own, then you would want not to switch; I found this conclusion absurd but my interlocutor did not. Go figure.
  • Pierre-Normand
    2.4k
    I suggested at one point in this thread that if told the value of the other envelope instead of your own, then you would want not to switch; I found this conclusion absurd but my interlocutor did not. Go figure.Srap Tasmaner

    That was a very pointed challenge, and I think it has force in the case where the analysis is assumed to apply to a game that can be instantiated in the real physical world and that can be played by finite creatures such as us. But in the ideal case where the prior distribution is assumed to be uniform and unbounded, then, the conclusion, although seemingly absurd, would be warranted.
  • Jeremiah
    1.5k
    But in the ideal casePierre-Normand

    Claiming this case is "ideal" is an entirely subjective standard pumped full of observational bias.
  • Jeremiah
    1.5k
    You use samples to make probabilistic inferences about an unknown population. I roll a die 10 times and 9 of those times get a 6. Now it is not impossible to get 6 nine times but it is highly improbable, therefore I decide the die is loaded. That is how it works; you use the samples to make probabilistic claims of an unknown population. Samples we are lacking.

    Now not only do we have an unknown population but we have an unknown function and that means unknown exploratory variables. Maybe Farmer Bob counted how many eggs his chickens laid the night before and that is how x was decided.
  • Jeremiah
    1.5k
    It does not even matter, as the second chance mechanism transforms the distribution.
  • JeffJo
    130
    Suppose, again, that a die throwing game will be played only once (i.e., there will be only one throw) with a die chosen at random between two oppositely biased dice as earlier described. ...Pierre-Normand
    I know we are reaching an equivalent conclusion. My point is that the framework that it fits into may be different. These concepts can seem ambiguous to many, which is the fuel Bayesians, Frequentists, Subjectivists, Objectivists, Statisticians, and Probablists use to denigrate each other through misrepresentation.

    My point was that the difference between "only once" and "many times" has no significance in this discussion. It can only have meaning to a statistician who is (correctly, don't think I am putting them down) trying to create a population through repetition of the game, from which he can use inference to refine his estimates of the properties of your dice.

    Probability models what is unknown about a system. At the start of your dice game, we don't know which die will be rolled, or how it will land. After step 1, the gamemaster knows which, but the player does not. Since their knowledge of the system is different, they have different probability spaces. The gamemaster says, for example, that 1 is more likely than 6. The game player says they are equally likely. Both are right, within their knowledge.

    Now, what if the 1-biased die is red, and the 6-biased one is green. If the player doesn't know this, only that the two biases exist, his knowledge that he has the red die does not put him in the gamemaster's position.

    In the OP, and before we look in the envelope, we are in the role of the game player. The probability that the low value of the envelopes is x is the determined by the distribution of the random variable we have called X. (I'm trying to get away from calling this "initial," since that has other connotations in Probability's derivative fields.)

    That distribution is an unknown function F1(x). After picking high/low with 50:50 probability, the value in our envelope is a new random variable V. Its distribution is another unknown function F2(v), but we do know something about it. Probability theory tells us that F2(v) = [F1(v)+F1(2v)]/2. But it also tells us that the distribution of the "other" envelope, random variable Y, is F3(y) = [F1(y)+F1(2y)]/2. Y is, of course, not independent of V. The point is that it isn't F3(v/2)=F3(v)=1/2, either.

    Looking in the envelope [correction:] not does change our role from that of the game player, to the gamemaster. Just like seeing the color of your die does not. Simply "knowing" v (and I use quotes because "treat it as an unknown" really means "treat it as if you know the value is v, where v can be any *single* value in the range of V") does not change increase our knowledge in any way.
  • JeffJo
    130
    I believe there is not a paradox here but a fallacy.Srap Tasmaner

    Exactly.

    Maybe I need to explain Simpson's "Paradox." It is a very similar, not-paradoxical fallacy. It just seems to be a paradox if you use probability naively.

    Say there is a fatal disease that 1 in 1,000 have, without showing symptoms. But there is a test for it that is 99.9% accurate. That means it will give a false positive with probability 0.1%, and a false negative with probability 0.1%. Sounds very accurate, right?

    Say you are one of 1,000,000 people who take the test. You test positive, and think that there is a 99.9% chance that you have the disease. But...

    • 1,000 of these people have the disease.
      • 999 of them will test positive.
      • 1 of them will test negative.
    • 999,000 of these people do not have the disease.
      • 998,001 of them will test negative.
      • 999 of them will test positive.
    So fully half of the people who test positive - 999 out of 1,998 - do not have the disease. Yes, you should be worried, but not to the degree suggested by the 99.9% accuracy. The good news is that if you test negative, there is only a 0.0001% chance you have it.

    The fallacy is confusing the probability of a correct result in a specific circumstance, with the probability that a specific result is correct. It sounds like the two should be the same thing, but they are not. The latter is influenced by the population who take the test. The 99.9% false negative figure applies only to the population that has the disease, not to the population who test positive. The 99.9% false positive figure applies to the population that does not have the disease, not to the population who test negative.

    The exact same thing happens in the TEP. The fallacy is confusing the 50% chance to pick the lower value, with a 50% chance that a specific value is the lower.
  • Jeremiah
    1.5k
    Each observed x is an independent event.
  • Pierre-Normand
    2.4k
    Claiming this case is "ideal" is an entirely subjective standard pumped full of observational bias.Jeremiah

    The OP doesn't specify that the thought experiment must have physically possible instantiations. The only ideal, here, consists in pursuing strictly the mathematical/logical consequences of the principle of indifference even when it is interpreted such as to entails that the relevant distribution of possible pairs of envelope contents must be assumed by the player to be uniform and unbounded.
  • Jeremiah
    1.5k
    A random variable is defined by a real world function.
  • Pierre-Normand
    2.4k
    (...) That distribution is an unknown function F1(x). After picking high/low with 50:50 probability, the value in our envelope is a new random variable V. Its distribution is another unknown function F2(v), but we do know something about it. Probability theory tells us that F2(v) = [F1(v)+F1(2v)]/2. But it also tells us that the distribution of the "other" envelope, random variable Y, is F3(y) = [F1(y)+F1(2y)]/2. Y is, of course, not independent of V. The point is that it isn't F3(v/2)=F3(v)=1/2, either.JeffJo

    I agree with this, and with much of what you wrote before. I command you for the clarity and rigor of your explanation.

    Looking in the envelope does change our role from that of the game player, to the gamemaster. Just like seeing the color of your die does not. Simply "knowing" v (and I use quotes because "treat it as an unknown" really means "treat it as if you know the value is v, where v can be any *single* value in the range of V") does not change increase our knowledge in any way.

    I am not so sure about that. The game master does have prior exact knowledge of the function F1(x), which I possibly misleadingly earlier called the "initial distribution". According to the OP specification, the player doesn't necessarily know what this function is (although, under one interpretation of the problem, it must be assumed to be uniform and unbounded). When the player opens her envelope her epistemic position remains distinct from that of the gamemaster since she still is ignorant of F1(x).

    My earlier point, which is a concession to some of the things @Jeremiah and @Michael have said, is that the nature of the "system" whereby some probability distribution is being generated, and which F1(x) represents, only is relevant to a player who plays this specific game, either once or several times. But what it is that I may not have conveyed sufficiently clearly, is that when the player not only plays this game only once, but also, isn't seeking to maximize her expected value with respect to the specific probability space defined by this "game" of "system", but rather with respect to the more general probability space being generated by the distribution of all the possible games that satisfy the OP general specification, the specific F1(x) that is being known to the game master is irrelevant to the player at any step. The mutual dependences of F1, F2 and F3, though, as you correctly point out, are indeed relevant to the player's decision. They are relevant to constraining the player's relevant probability space. This is the point that @Michael may have missed.
  • Pierre-Normand
    2.4k
    A random variable is defined by a real world functionJeremiah

    That's a bit like saying that a geometrical circle is defined by a real world cheese wheel.
  • Srap Tasmaner
    5k
    "I don't know" is not the same as "There is a 50% chance of each."Srap Tasmaner

    This is the part I'm still struggling with a bit.

    Even if I were to convince Michael that he had mistakenly assumed the chances for each criterion of success were equal, his response will still be something like this:

    There are still exactly two possibilities, and by stipulation I don't know anything about the odds for each, so by the principle of indifference I will still treat them as having even odds.

    So there's this backdoor 50% tries to sneak back in through.

    I know we have the argument about the sample space for X being uniform and unbounded. I just think there must be something else.
  • Jeremiah
    1.5k
    A normal prior would actually make more sense, as empirical investigations have shown it robust against possible skewness.
  • Jeremiah
    1.5k
    When dealing with unknown populations even priors needs to be tempered with a sample distribution before you have a posterior distribution that is reliable enough to make probabilistic inferences about the unknown population.

    Without a sample distribution, then your assumptions of the unknown distribution carry additional uncertainty that cannot be quantified. These additional assumptions come at a cost.
  • Jeremiah
    1.5k
    There are two mindsets here, people who hinge their expections on what is known and people who hinge their expections on what they feel they can safely assume about the unknown. However, these additional assumptions are not cost free and I am not sure everyone here gets that. When you start making assumptions, about unknowns, which are not backed by a probability model you inflate your uncertainty in a way that can't be quantified.
  • Pierre-Normand
    2.4k
    A normal prior would actually make more sense, as empirical investigations have shown it robust against possible skewness.Jeremiah

    I am not sure how you would define a normal prior for this problem since it is being assumed, is it not, that the amount of money that can be found in the envelopes is positive? If negative values are allowed, and the player can incur a debt, then, of course, a normal prior centered on zero yields no expected gain from switching. Maybe a Poisson distribution would be a better fit for the player's prior credence in a real world situation where negative amounts are precluded but no more information is explicitly given. But such a prior credence would also fail to satisfy the principle of indifference as applied to the 'v is lowest' and 'v is highest' possibilities, conditionally on most values v being observed in the first envelope.
  • Jeremiah
    1.5k


    A normal distribution does not have to have a mid of 0, nor do they need negative values.
bold
italic
underline
strike
code
quote
ulist
image
url
mention
reveal
youtube
tweet
Add a Comment