Mike Titelbaum on Shifting and Sleeping Beauty

In the last entry, I have suggested that

EEP) P_2(A) = P_1(+A|+E)

is a sensible rule for updating self-locating beliefs. Here, E is the total evidence received at time 2 (the time of P_2), and '+' denotes a function that shifts the evaluation index of propositions, much like 'in 5 minutes': '+A' is true at a centered world w iff A is true at the next point from w where new information is received. (EEP) therefore says that upon learning E, your new credence in any proposition A should equal your previous conditional credence that A will obtain at the next point when information comes in, given that this information is E.

(EEP) belongs to a family of rules that one might call 'shifting rules'. The general pattern is

P_2(A) = P_1(f(A) | f(E)),

with f shifting the evaluation index of the embedded proposition.

Mike Titelbaum also advocates a shifting rule, but without a fixed shifting functions f. Instead, different functions are used for different cases. For each case, we first have to find a context-insensitive term 'Z' such that the agent is certain at time 2 that 'Z is now' is true. Then we apply the rule

M) P_2(A) = P_1(A at Z | E at Z).

For example, suppose I fall asleep during a talk at 2:30. When I wake up, I see that it is 2:50. My new credence in any proposition A should then equal my 2:30 credence in 'A at 2:50' conditional on 'E at 2:50', where E is all the information I receive at 2:50. Thus if I notice that the talk hasn't finished yet, my new credence in the talk going over time should equal my old conditional credence in it going over time given that it won't have finished by 2:50.

(I should mention that the way I present Mike's proposal is different from the way he himself presents it in his paper. For present purpose, the differences don't matter.)

In this example, Mike's rule (M) and my rule (EEP) yield the same result. Not so in other cases.

Let's look at Sleeping Beauty. In order to easily apply (M), we assume that Beauty is confident that if she is awakened on both Monday and Tuesday, then her awakenings will not be subjectively indistinguishable: perhaps a dog will bark on Monday and not on Tuesday, or perhaps the light will be slightly dimmer on one of the days, etc. (Of course she doesn't know in advance on which day the dog will bark or the light will be dimmer.)

Let E be her total evidence when she awakens on Monday, and let Z = 'the time when my total evidence is E'. Beauty is certain that Z is now, so we can use (M) to get

P_2(Heads) = P_1(Heads at Z | E at Z).

Our assumption ensures that there can't be two times when the total evidence is E. But what if there is none? That is, should we regard 'E at Z' as true or as false at worlds where Z never obtains? The answer is 'as false'. Otherwise P_2 won't be a probability function, since P_1(A at Z | E at Z) will be high for every A whatsoever in cases where P_1(at some point Z) is low.

So 'E at Z' is false at any world where Z never occurs. Likewise, 'Heads at Z' is true only at worlds where the coin lands heads and Z occurs at some point. So we have

P_2(Heads) = P_1(Heads & at some point Z | at some point Z),

which reduces to

P_2(Heads) = P_1(Heads | at some point Z).

Recall that Z = 'the time when my total evidence is E'. And E is Beauty's total evidence on Monday, including her perception of the lab and the barking dog, her memories of the setup, the absence of any memories from later than Sunday, and so on. On Sunday, Beauty was confident that if Z occurs at all, then it must occur either on Heads-Monday, Tails-Monday or Tails-Tuesday. But she wasn't certain that Z would occur. For all she knew, the coin could have landed heads and she wouldn't hear a dog upon awakening on Monday. Or the coin could have landed tails and she wouldn't hear a dog on either awakening. Since there are two occasions where Z might occur on Tails, but only one on Heads, the probability for Tails given that Z occurs at some point is higher than the probability for Heads given that Z occurs at some point. By our instance of (M), P_2(Heads) is therefore less than P_1(Heads). And given that P_1(Heads) = 1/2, P_2(Heads) will most likely be 1/3.

Mike's rule (M) leads to thirding, my rule (EEP) to halfing. Whence the difference?

The reason is that my shifting operator '+' never shifts the evaluation index into the distant future. If +A holds at some point, then A must hold at the very next point where any information arrives. By contrast, Mike's evaluation point Z may well occur long after time 1, and after lots of other things have been learned. Sleeping Beauty, for instance, assigns substantial credence to possibilities where Z occurs on Tuesday, two days away.

How can we decide between far-reaching shifting operators such as Mike's 'at Z' and short-reaching operators such as my '+'? We won't find an answer by staring at Sleeping Beauty. But the difference shows up in other cases as well.

Consider theory T which holds that we will be subject to every humanly possible experience exactly once in our lifetime. (T probably entails that we live forever.) Let E be an experience that you might have tomorrow morning. All T possibilities, but only some ~T possibilities, are such that E will be experienced at some point. So when tomorrow you experience E, your new probability for T according to (M) should equal your previous probability for T conditional on the assumption that E is experienced at some point. So E raises the probability of T. The same obviously applies to any other experience you will ever have. According to (M), your confidence in T should steadily rise. (EEP), on the other hand, discards occurrences of E in the distant future and therefore avoids this conclusion. It seems to me that (EEP) gets it right.

Apart from delivering the intuitively correct result in cases like this, short-reaching shifting functions are also supported by theoretical considerations. To see why, note that shifting rules are only plausible if the agent does not receive relevant information at times in between time 1 and time 2 (the times of P_1 and P_2). If you receive strong evidence for A at some intermediate time, then your probability for A at time 2 should be sensitive to this evidence. Mike tries to avoid this problem by aggregating evidence over times: in his model, the evidence at time 2 effectively consists of everything that is certain at time 2, no matter when it was first learned. But this doesn't always help: centered evidence can affect the probability of uncentered propositions without making any of them certain. Since the centered evidence itself quickly becomes false, there is no guarantee that we can afterwards reconstrue it from propositions that are then certain.

For example, consider Beauty's credence in Heads on Wednesday, when she knows that the experiment is over. No matter what happened earlier, she will at this point have memories of the setup and of exactly one post-Sunday awakening. Let E contain everything of which she is now (on Wednesday) certain. Setting time 1 = Sunday and time 2 = Wednesday, (M) says that her new credence in Heads should equal her Sunday credence in Heads conditional on E obtaining on Wednesday. Since E contains no clues about the outcome of the coin toss, this is 1/2. On the other hand, setting time 1 = Monday and time 2 = Wednesday, her new credence in Heads should also equal her Monday credence in Heads conditional on E obtaining on Wednesday, which is 1/3. So if we allow time 1 and time 2 to be separated by arbitrary intervals, we get contradictory results.

Suppose then that shifting rules are restricted to cases where no relevant information is obtained between time 1 and time 2. (This is at any rate a restriction I want for (EEP).) Then it is clear why the shifting function ought to shift no further than to the next point where information comes in: this makes optimal use of the centered information contained in the old probabilities. Far-reaching shifting functions share the problems of inner-world indifference principles that recommend always distributing one's credence evenly among all possibilities within a universe that are compatible with the present evidence. Such principles cause severe and unnecessary information loss: if you know that somewhere in a distant galaxy there are lots of brains in a vat all of which believe that it is Saturday, then you may be certain that you live on Earth today (on Friday); but you must give up this belief tomorrow, even if you have learned nothing that would undermine the belief. Far-reaching shifting rules at least restrict themselves to possibilities in the future of the relevant subject -- ignoring possibilities in the past or possibilities in the lifetime of other people. But the problem remains for possibilities in the distant future.

So far, I have assumed that if the coin lands tails, then Beauty's Tuesday awakening follows her Monday awakening. But, as I mentioned in the last posting, I think the story might be better understood as a case of branching, where both the Monday and the Tuesday awakening directly follow Beauty's Sunday state. In this case, Tails-Tuesday is just as close as Tails-Monday, and my complaint about reaching too far in the future doesn't apply. What does (M) say about this version?

As before, let E be Beauty's total evidence on Monday and Z = 'the time when my total evidence is E'. According to (M),

P_2(Heads) = P_1(Heads at Z | E at Z).

How shall we read 'E at Z' if the future contains one branch with E and another without? Let's say it is true. In general, let's read 'X at Z' as true iff there is at least one branch with X & Z. (I think this is how Mike effectively reads it. And it doesn't matter because any reading leads to trouble.) Then we get

P_2(Heads) = P_1(Heads | at some point on some branch Z).

Since the Tails possibilities have two branches and thereby two opportunities for Z, and the Heads possibilities only one, the probability for 'at some point on some branch Z' is higher given Tails than given Heads. So P_2(Heads) < P_1(Heads), and under plausible assumptions we again get P_2(Heads) = 1/3.

Is this a reasonable treatment of branching? I think not. Suppose you accept the 'many-worlds' interpretation of quantum mechanics on which a branching occurs at every coin toss. Suppose you also know that a certain coin is biased 100:1, but you don't know whether its bias is towards heads or tails. You toss it and it lands heads. The biased-towards-heads hypothesis entails that the coin will land heads on many branches (or on branches with high amplitude); the biased towards-tails hypothesis entails that it will land heads on few branches (or on branches with low amplitude). Either way, the coin is predicted to land heads on some branches. Hence according to (M), P_2(biased-towards-heads) = P_1(biased-towards-heads | at some point on some branch Heads) = P_1(biased-towards-heads). The coin toss leaves you as undecided about the bias as you were before. This seems wrong.

Branching cases are tricky, and I think a satisfactory treatment requires leaving the framework of simple shifting rules altogether.

I have complained that (M) gives the wrong result in far-reaching cases and in branching cases. I would also complain that it gives no result at all in other cases: it falls silent if there is no context-insensitive term 'Z' for which the agent is certain that 'Z is now' is true. In Sleeping Beauty, this happens if there is a positive probability for the Tails-Monday experience being indistinguishable from the Tails-Tuesday experience. It seems to me that rational agents should always assign positive credence to worlds where some point in the future is indistinguishable from the present, so strictly speaking, (M) is never applicable to rational agents.

Here is one thing I'm puzzled about. (M), like (EEP), is an external, diachronic rule: it says what an agent's later credence should be given their earlier credence and their new evidence. It does not say what the later credence should be given certain beliefs about the earlier credence and the new evidence. For agents who always remember their earlier credence, the external rules can however be recovered from the corresponding expert principles

(EP) P_2(A | P_1(+A|+E) = x & E) = x

and

(M*) P_2(A | P_1(A at Z|E at Z) = x & E & Z is now) = x.

My reasons for prefering (EEP) over (M) carry over to (EP) versus (M*). And I do think that (EP) is a plausible constraint on rational credence. Nevertheless, I would like to defend (EEP) even in cases where it cannot be recovered from (EP). In discussion, Mike has called this 'crazy'. But isn't (M) just as external and diachronic as (EP)? Perhaps Mike wants to restrict it to cases where it can be recovered from (M*), in which case he doesn't really want to offer a diachronic update rule, but a synchronic expert rule? Not sure. Anyway, this question is probably orthogonal to the choice of shifting rules.

Comments

# on 06 December 2008, 18:36

Very interesting !
I'm glad to see that David Lewis has a good heir.
I'm not glad to see that thirders often neglect halfist writers. Bradley works and White paper are known, but Jenkins, Leslie, Franceschi, Meacham, Bostrom...

Sorry for my bad English :)

A French Bostrom advocate (and it's not easy !)

# on 12 January 2009, 03:44

Hey Wo,
Excellent post—lots for me to think about here, and I'll try to say more once I've done the requisite thinking. In the meantime, however, I want to say something about the very last point, the bit you're puzzled about. If you look at the top of this post, where you lay out (EEP) and (M), there's an important difference: (M) applies only when the agent is certain at t2 that "Z is now" is true, while (EEP) applies even if the agent is uncertain at t2 whether the current time is one "point" after t1. That last bit is the bit I think is crazy, and I think that complaint is orthogonal to the difference between (EEP)/(M) and (EP)/(M*).
Imagine for a moment that "+" were the operator "in 5 minutes." Then under (EEP), if I'm certain right now that it's noon, and you throw me into a sensory deprivation tank where I have no idea how quickly time is passing, I should nevertheless be certain at 12:05pm that it's 12:05pm (because I was certain at noon that in 5 minues it would be 12:05pm). *That* strikes me as crazy. I don't think it's a requirement of rational agency that agents be perfect trackers of the time.
You might respond that (EEP) is more plausible when we go back to the actual interpretation of "+", where it indicates the "the next point where new information is received." But here I run up against the fact that I don't really understand what your "next point" talk means.

As for (M), you're right that as I present it in my self-locating beliefs paper it doesn't say anything about the agent's having to remember her earlier credences at the later time. But that's only because that paper focuses on self-location rather than memory loss. In my dissertation I give serious consideration to memory-loss cases and wind up doing a lot of worrying about whether an agent should be required to adhere to her earlier credences even when she can't remember what they were. This is a tricky issue, but I largely come down on the side holding that if an agent can't remember her earlier credences she isn't bound to honor them. The most important argument there is that if an agent is bound to honor earlier credences she can't remember, it seems she should also be bound to honor earlier certainties, which would mean that any agent who forgets anything is thereby irrational. And that strikes me as unreasonable.
Notice, by the way, that this position wouldn't automatically commit me to offering only a synchronic expert rule as opposed to a diachronic update rule. You might think, for instance, that the synchronic rule holds in this case only because the diachronic one does. It's just that the diachronic rule may have exceptions for certain types of memory loss cases.

Finally, a word to Laurent: I don't think it's that thirders ignore halfers, I think it's that Sleeping Beauty is such an accessible puzzle that people in general often feel like they can write about it without having read much of what others have had to say. So halfers ignore thirders as much as vice versa. I'm glad to hear you're aware of Darren Bradley's stuff; besides the people you mention I think folks should also be seriously reading Joe Halpern's Sleeping Beauty paper.

Mike

# on 08 June 2009, 07:33

Hey Wo,

Told you I'd get back to this post at some point! (Though who knows if anyone's still reading....) Just wanted to point out that what you say above about my updating scheme and Wednesday in the Sleeping Beauty Problem is incorrect. According to my scheme, Beauty should assign a 1/2 degree of belief to heads on Wednesday, and you can get that either by comparing Wednesday with Sunday or Wednesday with Monday. The Wednesday/Sunday one is easy; let's figure out where your Wednesday/Monday analysis goes wrong.

I'll assume we're working with a case in which the awakenings are subjectively distinguishable. (As you point out, if they aren't my theory won't give you any verdicts linking Wednesday with Monday. But then those verdicts can't contradict verdicts linking Wednesday with Sunday!) Suppose Beauty is awake on Monday (though she doesn't know it's Monday) and the lights dim for a moment. Her relevant context-insensitive evidence is then "I awaken on a day when the lights dim." Now suppose she falls asleep and awakens on Wednesday still remembering the dimming event. Since she knows she can only remember her last awakening day, her relevant context-insensitive evidence is now "The lights dim on the last day I awaken." This is different from the evidence she had on Monday; in particular, it's stronger. Some quick calculations show that on Monday her credence in heads conditional on "The lights dim on the last day I awaken" is 1/2. So no contradiction for me at all.

Mike

# on 10 June 2009, 19:52

Hi Mike!

That's interesting. I didn't know you're a Wednesday-halfer; is this common among thirders?

You're right that your account is not threatened by the inconsistency I suspected there. As I wrote, I thought that centered evidence could sometimes affect the probability of uncentered propositions without making any uncentered proposition certain. But on your account, this cannot happen unless some uncentered proposition loses certainty, which I suppose you would count as a case of "forgetting", so that (M) no longer applies. (Quick argument, mainly as a reminder to myself: suppose the subject learns E and this does not make any uncentered proposition certain; since E entails "somewhere E", P_1(somewhere E) must have been 1 already; but then for any uncentered A, P_2(A) = P_1(A at E | somewhere E) = P_1(A at E) = P_1(A), assuming E is certain to occur no more than once; so the probability of A is unaffected by learning E.)

I hope I'll get to look at your treatment of forgetting etc. once I've settled in here.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.