## A dutch book against CDT? (EDC, ch.8)

The eighth and final chapter of Evidence, Decision and Causality asks whether the actions over which we deliberate should be evidentially independent of the past. It also presents a final alleged counterexample to CDT.

A few quick comments on the first topic.

It is often assumed that there can be evidential connections between what acts we will choose and what happened in the past. In Newcomb's Problem, for example, you can be confident that the predictor foresaw that you'd one-box if you one-box, and that she foresaw that you'd two-box if you two-box. Some philosophers, however, have suggested that deliberating agents should regard their acts as evidentially independent of the past. If they are right then even EDT recommends two-boxing in Newcomb's Problem.

Arif considers some arguments for the independence requirement, and finds them all wanting. I agree with much of what he says.

The best argument he finds goes like this. During deliberation, the question what you will do is equivalent to the question what you should do. If information about the past were relevant to what you will do, it would also be relevant to what you should do. But there are cases in which it is not. For example, a gambling addict who deliberates about whether to gamble may know that she often gave in to the temptation in the past. This historical fact is evidence that she will give in this time. But it doesn't provide any reason to give in.

In response, Arif suggests that we should distinguish two readings of 'I will do so-and-so' – one reading on which the sentence expresses a prediction, and one on which it expresses a present intention. On its second reading, the question of what you will do should be insensitive to non-reason-giving information about the past.

That may be right. I'm not sure. The argument (involving the alleged equivalence of 'will' and 'should') looks unconvincing anyway.

Let's assume that you engage in an actual process of deliberation. Then it makes sense to assume that your initial beliefs about what you will do are partly shaped by evidence you have about the past. At that stage, 'what will I do?' is clearly not equivalent to 'what should I do?'. At the endpoint of deliberation, when you have settled on what to do, the two questions are more closely connected. If you are rational, and know that you are rational, and know about your beliefs and desires, then you will believe that you will do X iff you believe that X is the right choice. But the questions are still not equivalent, as we can see if (for example) you don't believe that you are rational. In that case, you may well think that you'll end up doing one thing even tough you ought to do another.

The gambler case is anyway somewhat beside the point. What matters for decision theory is whether mere hypotheses about the past are evidentially relevant to hypotheses about what you will do. If you're unsure whether you'll one-box or two-box in Newcomb's Problem, the hypothesis that a reliable predictor has predicted that you'll one-box makes it likely that you'll end up one-boxing. The hypothesis provides no reason or incentive to one-box. As such, it shouldn't steer your deliberation towards one-boxing. And of course it doesn't. It is, after all, only a hypothesis.

Overall, I agree with Arif that the independence requirement is implausible and unmotivated.

Let's turn to the final counterexample to CDT. This actually comes up in the context of the previous topic, but I'll take it out of context.

The scenario resembles the Newcomb Insurance case from the previous chapter.

Psycho Insurance. In front of you is a button. You can either press it or not. If you have been predicted to press it, doing so will cost you $1. If you have been predicted not to press it, pressing it will give you$1. After making your choice, and before you learn of the outcome, you are offered a chance to bet that the prediction was accurate. Taking the bet gets you $0.50 if the prediction was accurate and costs you$1.50 if it was inaccurate. At the outset, you are highly confident that the prediction is accurate.

This is another sequential choice problem with two stages. In stage 1, you are asked whether to press the button. In stage 2, you are asked whether to accept the bet.

Arif assumes that you should accept the bet in stage 2. If you press the button in stage 1, you are therefore guaranteed to lose $0.50 overall. If you don't press the button, you either gain$0.50 (if you were predicted not to press the button) or lose $1.50 (if you were predicted to press the button). EDT says that you shouldn't press the button. CDT seems to allow for this as well. Specifically, not pressing the button maximises (causal) expected utility if your credence in having been predicted to not to press the button is greater than 0.5. If your credence is less than 0.5, however, pressing the button maximises (causal) expected utility. In that case, Arif assumes, CDT says that you should press the button in stage 1, accept the bet in stage 2, and face a sure loss of$0.50.

Arif calls this a "diachronic Dutch Book" against CDT.

Like in Newcomb Insurance, we also seem to have a problematic divergence between what CDT identifies as the optimal plan and what it says you should do at the individual stages. Here is the matrix for the four possible plans.

pred-press pred-¬press
press ∧ bet $-0.50$-0.50
press ∧ ¬bet $-1$1
¬press ∧ bet $-1.50$0.50
¬press ∧ ¬bet $0$0

Rows 1 and 3 are dominated. If you could decide on a plan for both acts at once, it therefore looks like you should choose either press ∧ ¬bet or ¬press ∧ ¬bet. Either way, you'll plan to not bet in stage 2. Yet once you get to that stage, CDT says that should bet.

Let's go through all this more slowly.

In stage 2, taking the bet is the right choice if, at that point, you are still confident that the predictor foresaw what you did in stage 1. Like in Newcomb Insurance, we can distinguish two versions of the scenario – one in which you believe that the predictor can foresee how states of indecision are resolved and one in which you believe she can only foresee decisions or indecisions. This time, I'll focus on the first version, which is arguably implied by Arif's stipulations.

Let's assume, then, that the predictor has the superpower to foresee your eventual act in stage 1, even if you remain undecided. We then know that you should take the bet in stage 2.

Return to stage 1. If you had faced the stage 1 problem in isolation – without any subsequent stage 2 – CDT would say that you should remain undecided between pressing the button and not pressing it. Things change if we factor in the additional payoff from stage 2. The stage 1 problem then appears to have three equilibria.

One, you could decide to press the button. You should then be confident that this has been predicted, and that you'll end up with $-1 +$0.50 = $-0.50 overall. If you were to not press the button, you'd probably end up with$0 + $-1.50 =$-1.50, because you'd have rendered the prediction false. So this appears to be a stable state.

Two, you could decide to not press the button. You should then be confident that you'll end up with $0 +$0.50 = $0.50. The other option apparently would have left you with$1 + $-1.50 =$-0.50.

Three, you could be perfectly undecided between pressing and not pressing. In that state, you should give credence 0.5 to either prediction. Your expected net payoff is then $0: if you were predicted to press, you end up with$-1 + $0.50 =$-0.50; if you were predicted to not press, you end up with $0 +$.50 = $.50; both are equally likely. The state is stable because both pure options have a lower expected payoff. If you were to press, you'd end up with either$-1 + $.50 =$-.50 or $1 -$1.50 = $-.50, with equal probability (average:$-.50). If you were to not press, you'd end up with either $0 +$-1.50 = $-1.50 or$0 + $0.5 =$0.50 (average: $-.50). The best of the three equilibria is the second. It is worth$0.50, compared to $-0.50 for the first and$0 for the third. Any sensible form of CDT should at least declare the second equilibrium permissible. I'm inclined to say that it is required.

CDT therefore endorses what Arif thinks is the correct solution: not pressing the button in stage 1 and taking the bet in stage 2.

Arif points out that if your credence in pred-press is greater than 0.5, then CDT says that you must press the button in stage 1 and take the bet in stage 2, for a guaranteed loss of $0.50. Is that correct? We are invited to consider a situation in which your credence in pred-press is greater than 0.5. Arif doesn't say whether this is supposed to be your initial credence, at the start of deliberation, or your reflective credence, at the end of deliberation. The first kind of situation is relatively unproblematic. The second is not. Suppose that, for whatever reason, you start your deliberation in a state in which your credence in pred-press is greater than 0.5. Perhaps it's 0.7. In that case, it should not remain at 0.7. Due to your belief in the prediction's accuracy, your credence in pred-press is tied to your credence in press. If during deliberation you become convinced that not pressing is the right choice, you'll also become convinced that you won't press the button, and thereby that you have not been predicted to press the button. This is what my preferred form of CDT says. Whatever your initial credence might have been, I say that your reflective credence in pred-press should be near 0. That is, whatever credence you start out with, you should not press the button in stage 1. You'll never make a guaranteed loss. We might consider what you should do if your reflective credence in pred-press is greater than 0.5. But now we're talking about an odd situation. According to best-equilibrium CDT, you should never be in that situation. Your reflective credence in pred-press should be near 0. Best-equilibrium CDT therefore avoids the "diachronic Dutch Book". But other forms of CDT do not. According to permissive CDT, any equilibrium in a decision problem is rationally acceptable. On this account, it looks like you may rationally reach the first equilibrium in stage 1, and make a guaranteed loss. This kind of case is a reason to prefer best-equilibrium CDT. On closer inspection, however, it is not obvious that even permissive CDT falls prey to the apparent Dutch Book. Perhaps there is in fact no equilibrium at which you can decide to press the button. Let's return to the apparent first equilibrium. Assume you are confident that you will press the button and that this has been predicted. You should then also be confident that you will accept the bet in stage 2, and that you'll end up with$-1 + $0.50 =$-0.50 overall. We need to ask if it would have been better to not press the button. If yes, your decision isn't stable.

So what would be the case if – counterfactually – you were to not press the button? Well, the prediction would have been false. The crucial question is whether you would still accept the bet in stage 2. We know that accepting the bet is in fact the uniquely rational choice. But it might be irrational under counterfactual circumstances.

Evidently, you would accept the bet iff you would be sufficiently confident that your act in stage 1 has been predicted. So we need to ask what you would believe about the prediction if – counterfactually – you were to not press the button.

It may help to consider the parallel question in Newcomb's Problem (with an all-but-infallible predictor). In that scenario, I would take both boxes and get $1000. I couldn't have done any better. The opaque box is empty. If I had taken only the opaque box, I would have gotten nothing. Now here's the question. What would I have believed about the content of the opaque box if – counterfactually and irrationally – I had one-boxed? Would I have said to myself "I'll take just this box, of which I know that it is empty"? Or would I have said "I'll take just this box, of which I think it contains a million"? The answer isn't obvious. But the first possibility looks defensible. When I consider what would have happened if I had just taken the opaque box, I don't envisage being completely shocked to find the box empty. I already know that it's empty. By taking the box I would knowingly take an empty box. (Suppose I enjoy finding out that I'm wrong about something of which I was highly confident. Let's say this is worth$2000 to me. If the counterfactual situation in which I one-box is a situation in which I'm convinced that the box contains a million then I couldn't rationally take two boxes, because the counterfactual situation would be better.)

If that is correct then my confidence in the prediction's accuracy is not robust under counterfactual supposition. If I were to one-box, I would still think that the box is empty, and so I would think that the predictor made a mistake.

Now back to Psycho Button. We are looking at the supposed equilibrium in which you are confident that you'll press the button. What would be the case, we ask, if you were to not press the button? Would you still believe that the prediction is accurate? This is the same question as in Newcomb's Problem, and the answer isn't obvious. One might reasonably argue that your confidence in the predictor's accuracy is not robust under the counterfactual supposition. If that is correct – more precisely, if the kind of supposition that is relevant to CDT gives this answer – then the counterfactual scenario in which you don't press the button is a scenario in which you do not accept the bet in stage 2. And then that scenario has a known utility of $0, which is better than$-0.50. So you're not in equilibrium.

This issue is related to a general problem with the "sophisticated" approach to sequential choice, and with "backwards induction" arguments in game theory. When we consider what an agent should do at a choice point, we assume that the agent is rational, knows that she is rational, has not lost any of her evidence, and so on. But when we reason backwards, we must also consider what the agent would do at that choice point if she were to make a possibly irrational choice at an earlier point. We can't assume that the answer is the same.

In sum, it's not obvious that permissive CDT allows for the problematic choices in Psycho Insurance.

So much for the diachronic Dutch Book. What about the apparent mismatch between the optimal strategy and your actual choices? Here the situation is analogous to that in Newcomb Insurance.

Remember the decision matrix for the possible plans.

pred-press pred-¬press
press ∧ bet $-0.50$-0.50
press ∧ ¬bet $-1$1
¬press ∧ bet $-1.50$0.50
¬press ∧ ¬bet $0$0

On the assumption that the predictor can foresee resolutions of indecision, the only equilibrium is perfect indecision between rows 1 and 3.

This may be surprising given that both of these options are dominated. Indeed, Arif assumes that because press ∧ bet and ¬press ∧ bet are both dominated, CDT requires you to choose one of the other plans. But not so. In fact, none of the other plans is rationally choosable, nor is there a stable state of indecision in which you think you might implement one of these plans.

But indecision between the two dominated plans is stable. Suppose you're in that state. Then you are 50% confident that you will press the button and accept the bet. You will then have been predicted to press the button, so you'll lose $-0.50. The other 50% of your credence goes to scenarios in which you don't press and bet, in which case you will have been predicted to not press, and you end up with$0.50. Your expected payoff is $0. Even though you prefer one direction of the indecision state to the other, you are not drawn further towards the relevant pure choice: if you were to directly choose to not press and bet, you would get either$-1.50 or \$0.50, with equal probability.

So if you had a choice between the plans, you should be undecided between press ∧ bet and ¬press ∧ bet: you should settle on betting in stage 2 and you should remain undecided about stage 1.

We've already covered the individual decision problems. Here you should take the bet in stage 2, and you should arguably not press the button in stage 1, although permissive CDT might also allow that you press the button or remain undecided in stage 1.

In any case, we don't have the mismatch between planning and implementation that Arif predicts. If you could decide between plans, you would decide to accept the bet in stage 2. Once you reach that stage, this is just what you will do.

Arif actually doesn't consider the possible match or mismatch between planning perspective and implementation perspective. Like much of the literature, he instead concentrates on your attitudes towards plans and their continuation. He suggests that CDT violates the principle I called (DC3) in the previous post, according to which the continuation of any ex ante acceptable plan is still acceptable at any point that is compatible with its implementation.

In the previous post I showed that permissive CDT validates (DC3), assuming that "acceptable" means "rationally choosable". In Psycho Insurance, the principle is vacuously satisfied because there is no rationally choosable plan. The principle is even satisfied if we weaken "acceptable" to encompass options that have positive probability in a rational state of indecision. On this reading, press ∧ bet and ¬press ∧ bet are both acceptable. And their continuation, to bet, is acceptable whatever you do in stage 1.

We come closer to dynamic inconsistency if we assume that the predictor can't foresee how states of indecision are resolved.

In that case, you should accept the bet in stage 2 only if you have not been undecided in stage 1. In stage 1, we get the same three apparent equilibria as above, with the same caveat.

In the choice among plans, we get a new equilibrium. This time, you should be perfectly undecided between press ∧ ¬bet and ¬press ∧ ¬bet.

Here, then, one might say that when you consider the possible plans then you think that you should definitely reject the bet in stage 2. Yet when you come to stage 2, you should accept the bet. On the weak reading of "acceptable", ¬press ∧ ¬bet is acceptable but its continuation (after the first stage has been implemented) is not.

I don't think this is a serious problem. It would really be worrying if CDT told you that the best plan is to first do A and then B, but also that you should choose A and ¬B when you reach the two choice points. If ¬B is better than B on the assumption that you do A, then A ∧ B can hardly be the optimal plan. But CDT doesn't tell you that ¬press ∧ ¬bet is the best plan.