## Dynamic Causal Decision Theory (EDC, ch.s 7 and 8)

Pages 201–211 and 226–233 of Evidence, Decision and Causality present two great puzzles showing that CDT appears to invalidate some attractive principles of dynamic rationality.

First, some context. The simplest argument for two-boxing in Newcomb's Problem is that doing so is guaranteed to get you $1000 more than what one-boxing would get you. The general principle behind this argument might be expressed as follows: Could-Have-Done-Better (CDB): You should not choose an act if you know that it would make you worse off than some identifiable alternative. Arif argues that whatever motivates the CDB principle also motivates an analogous principle for sequences of choices: Sequential Could-Have-Done-Better (SCDB): You should not make a sequence of choices if you know in advance that doing so would make you worse off than some identifiable alternative. Arif goes on to present an example in which CDT does not respect the sequential principle. This, he claims, renders the original principle "completely unmotivated" (p.211). Newcomb Insurance. You face the standard Newcomb Problem, but with different monetary values. The transparent box is empty, the opaque box contains$100 iff you have been predicted to one-box. You are almost certain that the predictor has foreseen your choice. After you've made your choice about which boxes to take, and before the content of the opaque box is revealed, you must bet on whether the predictor foresaw how many boxes you took. If you bet that the prediction was accurate, you get $25 if you're right and lose$75 if you're wrong; if you bet that the prediction was inaccurate, you get $75 if you're right and lose$25 if you're wrong.

What should you do? Arif considers both the "myopic" and the "sophisticated" approach to sequential choice. I will concentrate on the latter, which almost everyone agrees is more reasonable.

In the sophisticated approach, we begin with stage 2 of the scenario, where you must bet on whether the prediction about stage 1 (the Newcomb Problem) was accurate or inaccurate. Since you are highly confident in the predictor's powers, it looks like you should bet that the prediction was accurate.

Now we look at stage 1, keeping in mind that you will bet on an accurate prediction in stage 2. Unlike in an isolated Newcomb Problem, what you should do in stage 1 (according to CDT) seems to depend on your beliefs about what you've been predicted to do.

If you are confident that you've been predicted to one-box, you can expect to get $125 overall through one-boxing ($100 from the box plus $25 from the bet), while two-boxing would only get you$25 ($100 from the boxes plus$-75 from the bet). So if you are confident that you've been predicted to one-box then you should one-box. If, on the other hand, you are confident that you've been predicted to two-box then two-boxing is better. You can thereby expect to get $25, while one-boxing would get you$-75.

In sum, CDT appears to say that you should bet on an accurate prediction in stage 2, and that you may either one-box or two-box in stage 1.

But now look at the decision matrix for the combined choices:

pred-1b pred-2b
1b & bet-acc $100+$25 = $125$0-$75 =$-75
1b & bet-inacc $100-$25 = $75$0+$75 =$75
2b & bet-acc $100-$75 = $25$0+$25 =$25
2b & bet-inacc $100+$75 = $175$0-$25 =$-25

We've just seen that CDT appears to recommend either row 1 or row 3, if you face the choices one after the other. But row 1 is dominated by row 4, and row 3 is dominated by row 2. We thus have a violation of SCDB: CDT recommends a sequence of acts for which there is an identifiable alternative that would have done better.

For example, suppose you think you've been predicted to one-box. Then CDT suggests that you should one-box in stage 1 and bet on an accurate prediction in stage 2, even though you know in advance that it would be better to two-box in stage 1 and bet on an inaccurate prediction in stage 2.

How bad is this violation of SCDB? Does it render the original CDB principle "completely unmotivated"?

I don't think so. What motivates the original CDB principle is that you should try to do as well as you can in your present decision situation. If you know that a certain option would lead to a better outcome than the option you choose then you are not making the most out of your decision situation.

Now compare a sequential choice scenario. Here you only have direct control over what you do at the present choice point. Your future acts should be treated like the acts of another agent. But a CDB-like principle for groups of agents is clearly indefensible. Suppose you're in a Prisoners' Dilemma, and you choose defection. So does your partner. It would have been better if you had both chosen cooperation, and even better if you had chosen defection and your partner cooperation. So what? You can't bring about these better scenarios. By choosing defection, you do as well as you can. Scenarios that you can't bring about are irrelevant to what you should do.

The SCDB principle has the same flaw. Since you can only control what you do at the present choice point, considerations about what would happen if you made different choices in the past or in the future are irrelevant to what you should do now.

But there remains a puzzle. While the general SCDB principle is highly implausible, one might still have expected it to hold – along with other principles of dynamic rationality – under certain restricted conditions, and these conditions appear to be satisfied in Newcomb Insurance.

Let a plan be a proposition that specifies what you do at each choice point. The SCDB principle links an agent's evaluation of plans to her actual choices. It says that the acts you choose should combine to an optimal plan. We can expect this to fail if your preferences at different choice points are not aligned, as in a Prisoners' Dilemma. The assumption also becomes problematic if your beliefs change in funny ways, as in the puzzle of the absentminded driver (see Schwarz (2015)). But suppose your basic uncentred desires don't change: if you prefer one uncentred world to another at the outset, then you prefer the first world to the second at each choice point. Suppose also that you update your beliefs by conditionalisation, that you know this, and that you know that you make a rational choice whenever you reach a choice point. Under these conditions, one might expect that the "planning perspective" should be in harmony with the "implementation perspective". That is, one might expect that the following principle should be satisfied:

Dynamic Consistency: A plan is rationally choosable iff it is rationally implementable.

By saying that a plan P is rationally choosable I mean, roughly, that if you could settle on a plan in advance – being sure that you would follow it – then you could rationally settle on plan P. In fact, every popular form of decision theory allows us to evaluate merely hypothetical options. So we don't need to consider counterfactual scenarios in which you have different options. More precisely, then, a plan is rationally choosable iff it is a rational choice in a hypothetical decision problem in which the options are all possible plans for the relevant sequence of acts. A plan is rationally implementable if at each choice point, you could rationally choose whatever the plan tells you to do.

Informally, the left-to-right direction of Dynamic Consistency says that if you judge a certain sequence of acts to be the best strategy for reaching your goals then rationality should allow you to carry out these acts. The right-to-left direction says that you should believe that your individual choices together amount to an optimal strategy for reaching your goals.

From a CDT perspective, the SCDB principle is almost equivalent to the right-to-left direction of Dynamic Consistency. SCDB says that if a plan is rationally implementable – if you could rationally follow the plan at each choice point – then there is no alternative plan of which you know that it would make you better off. According to CDT, a plan maximizes expected utility iff there is no alternative that would make you better off. SCDB therefore says that if a plan is rationally implementable then it maximises (causal) expected utility. On the assumption that every choosable plan maximises (causal) expected utility, the right-to-left direction of Dynamic Consistency entails SCDB. On the further – problematic – assumption that maximising (causal) expected utility is sufficient for choosability, the two principles are equivalent.

In Newcomb Insurance, the conditions for Dynamic Consistency are plausibly satisfied. The scenario does not involve changing preferences. Nor does it involve funny changes of belief. And yet SCDB appears to fail. Dynamic Consistency appears to fail as well, in both directions. For the left-to-right direction, recall that the two plans involving bet-acc are dominated. Any choosable plan, it seems, would have to involve betting on an inaccurate prediction in stage 2. But you can't rationally implement such a plan. Once you reach stage 2, you must bet on an accurate prediction.

We can strengthen the puzzle. In Newcomb Insurance, nothing of any relevance seems to happen between stage 1 and stage 2. We may even assume that your uncentred beliefs and desires remain completely unchanged. Suppose in stage 1 you decide to take (say) both boxes. Compare your state of mind in this situation with your later state of mind in stage 2. Your basic desires are the same: you want to maximize your total payoff. Your beliefs about the world are also the same: you don't acquire any new information after stage 1. Your self-locating beliefs are different, but they don't seem to be relevant to your choices.

If your beliefs and desires remain unchanged from stage 1 to stage 2, it doesn't seem to matter that stage 2 takes place after stage 1. You might as well face the two choices at the same time. And if you face two choices at the same time, isn't that equivalent to facing a single choice concerning two acts? Intuitively, it shouldn't make a difference whether we model your choices as a single combined decision problem or as two separate problems. CDT, however, seems to say that what you should do depends on how we model your choices. If we model them as a single decision problem, you should bet on an inaccurate prediction; if we model them as two separate problems, you should make the opposite choice. Strange!

CDT here also appears to violate the Preference Reflection principle from the previous post. If you could choose a strategy in stage 1 then it looks like you should choose either 1b & bet-inacc or 2b & bet-inacc – the other strategies are dominated. Both of these strategies involve bet-inacc. So in stage 1, you should judge that it would be best to choose bet-inacc in stage 2. But when you actually reach stage 2, you foreseeably think it would be better to choose bet-acc.

To sort out these puzzles, let's take a closer look at what CDT says about the scenario.

The answer turns out to depend on a question that we haven't yet settled. You are almost certain, I said, that the predictor has foreseen your choice in stage 1. But what if you don't make a choice? What if you end up in a state of indecision? Presumably you believe that the predictor can foresee that you'd reach this state of indecision. But can she also foresee how the state will be resolved, what act you'll eventually perform?

I will consider both possibilities. To begin, let's assume the predictor can only foresee your state of decision or indecision. If you end up perfectly undecided between one-boxing and two-boxing, her "prediction" about whether you'd take the second box is an uneducated guess. I'll turn to the other version of the story below, but I can already reveal that (a) the upshot will be the same, and (b) the present version brings out the relevant issues more vividly.

Let's first work out what CDT says about the hypothetical choice of plans. Here, again, is the decision matrix.

pred-1b pred-2b
1b & bet-acc $100+$25 = $125$0-$75 =$-75
1b & bet-inacc $100-$25 = $75$0+$75 =$75
2b & bet-acc $100-$75 = $25$0+$25 =$25
2b & bet-inacc $100+$75 = $175$0-$25 =$-25

The only equilibrium in this decision problem is a state of perfect indecision between rows 2 and 4.

It is clear that you must be undecided between one-boxing and two-boxing. If you're inclined towards one-boxing then you can be confident that you're in the left column, in which case row 4 is best, and row 4 involves two-boxing. Similarly, if you're inclined towards two-boxing, you should be confident that you're in the right column, in which case row 2 is best, and row 2 involves one-boxing. Also, if you're undecided between one-boxing and two-boxing but inclined towards accepting the bet – that is, if you're undecided between rows 1 and 3 – then your expected payoff is $25, and then row 2's guaranteed$75 will seem better; so this is not a stable point either. Indecision between rows 2 and 4, by contrast, is stable. If you're in this state, you are equally likely to one-box and to two-box. If you end up one-boxing, you are equally likely to find $100 or$0 in the box, since the predictor can't foresee the resolution of your indecision. More generally, no matter how your indecision resolves, both columns are equally likely. Your expected payoff is therefore $75 (the average of the four cells in rows 2 and 4). No pure option is more attractive. Now for the two individual decision problems, assuming that stage 2 takes place after stage 1. Following the sophisticated approach, we begin with stage 2. The decision problem here is easy, although not quite as easy as we previously assumed. Since the predictor can't foresee resolutions of indecision, you should bet on the accuracy of her prediction iff you did not remain undecided in stage 1. If you did remain undecided then you should bet on the inaccuracy of her prediction. The decision problem in stage 1 – between one-boxing and two-boxing – now has three equilibria. First, you could decide to one-box. You know that you will then choose bet-acc in stage 2. Since you will probably have been predicted to one-box, you'll probably end up with$125. If you had two-boxed, you would have ended up with only $25, since you would still have chosen bet-acc in stage 2 and you would have lost that bet. Second, you could decide to two-box. In that state, you expect to end up with$25. One-boxing would have cost you $75. Third, you could remain perfectly undecided between one-boxing and two-boxing. Since the predictor can't foresee resolutions of indecision, you will then choose bet-inacc in stage 2. There's an equal chance of having been predicted to one-box and having been predicted to two-box, so the total expected payoff is$75. No other option is better.

This third equilibrium corresponds (in some obvious informal sense) to the unique equilibrium for the combined problem. Recall that if you could choose a plan, you would settle on bet-inacc in stage 2 but remain undecided about stage 1. If you go for the third equilibrium in the stage 1 problem, you likewise choose bet-inacc in stage 2 but remain undecided in stage 1.

If we could show that the third equilibrium is uniquely rational, we would have a kind of harmony between planning perspective and implementation perspective. The third equilibrium might be supported by certain assumptions about the dynamics of deliberation, as in Skyrms (1990). I myself, however, favour a "best-equilibrium" version of CDT, which recommends the first equilibrium (because it is the best).

In my preferred form of CDT, then, we have a clear mismatch between planning perspective and implementation perspective. In particular, we get the anticipated violation of SCDB: You should implement 1b & bet-acc even though you know that 2b & bet-inacc would be better.

We do not have a counterexample to the left-to-right direction of Dynamic Consistency. Every rationally choosable plan is rationally implementable. That's because there is no rationally choosable plan. The only rational attitude towards the choice of plans is indecision.

But this is a lucky coincidence. Consider a variant of the scenario in which you have a further option in stage 1: You can randomize your choice by tossing a fair coin; the predictor can't foresee the outcome. In this scenario, you could rationally plan to randomize in stage 1 and bet on an inaccurate prediction in stage 2. But in the two individual decision problems, best-equilibrium CDT still says that you should choose one-boxing in stage 1 and bet-acc in stage 2, assuming you are certain that the prediction is accurate. The unique choosable plan is not rationally implementable. Dynamic Consistency fails in both directions.

Why does it make a difference if we consider your two choices separately or as part of a single decision problem?

Have another look at the decision matrix.

pred-1b pred-2b
1b & bet-acc $100+$25 = $125$0-$75 =$-75
1b & bet-inacc $100-$25 = $75$0+$75 =$75
2b & bet-acc $100-$75 = $25$0+$25 =$25
2b & bet-inacc $100+$75 = $175$0-$25 =$-25

If we treat the two choices as separate problems, you should implement row 1 (or so I claim). In the combined problem, this is not an acceptable solution. If you implement row 1 then you're probably in the left column, and then row 4 would be better.

Crucially, rows 1 and 4 involve opposite acts in both stages. That's why the superiority of row 4 is not an argument against row 1 when we consider the problems separately. If you are in stage 1 of implementing row 1, confident that you'll choose one box, you can be confident that you will bet on an accurate prediction in stage 2. You might notice that it would be better if you now took both boxes and bet on an inaccurate prediction in stage 2. But since you can't directly control your choice in stage 2, this observation has no practical relevance. Similarly in stage 2. Having chosen one box in stage 1, you might notice that a counterfactual scenario in you took both boxes and now bet on an inaccurate prediction would be better. But since you can't bring about that scenario, this is irrelevant to what you should do.

We must always attend to what the agent can control. In cases like Newcomb Insurance, it makes a difference whether you have simultaneous control over both of your choices. The combined decision problem correctly represents your decision situation only if you have simultaneous control over both choices. This is plausibly the case if the two stages take place at the same time. In the simultaneous version of Newcomb Insurance, you have control in "stage 1" over what you do in "stage 2", and vice versa. Not so if you the two stages occur at different times.

When we intuit that the "planning perspective" should harmonize with the "implementation perspective", we assume that the two perspectives are different perspectives on the same choices. But if you can't actually choose a plan then the "planning perspective" distorts your choice situation. It assumes that you have direct control over things over which you actually don't have direct control.

To sum up. Some forms of CDT allow for violations of Dynamic Consistency and SCDB. But it is not obvious that this is a flaw. If we pay attention to what an agent can control, we should not assume that the "planning perspective" and the "implementation perspective" are always in harmony, even if the agent's preferences don't change etc.

The upshot for the original Could-Have-Done-Better principle remains the same. That principle is motivated by the idea that you should try to make the most of your decision situation. This idea does not motivate the dynamic principles – on the contrary, it explains why these principles can fail.

What about the foreseeable reversal of preference? Don't you initially prefer bet-inacc, but foreseeably prefer bet-acc once you reach stage 2? Not quite. You know that you'll choose 1b in stage 1. So you know that you would be worse off with bet-inacc in stage 2. Given your knowledge of what you will do, you also don't prefer a combination of being undecided in stage 1 and bet-inacc in stage 2. (This alternative would have expected payoff $125, no better than what you actually do.) You do prefer the combination of two-boxing and bet-inacc. But that preference doesn't reverse. You still prefer this in stage 2. For the same reason, you won't regret your choice in stage 1 once you reach stage 2 – although it might initially appear that way. You might think: "I took one box. So I'm in the left column. I should have taken both boxes. Then I could now bet on an inaccurate prediction and make$175." Isn't that regret? No, it's lose talk. You shouldn't really believe that it would have been better if you had taken both boxes. For you know that if you would have done that then you would still bet on an accurate prediction now. The scenario of which you "regret" that it doesn't obtain – the scenario in which you took both boxes and now bet on an inaccurate prediction – involves different choices in the past and in the present.

This post is already too long. But we have only looked at one version of Newcomb Insurance. I have assumed that the predictor cannot foresee how states of indecision are resolved. In the remainder of the post, I'll briefly look at what happens if the predictor can foresee the acts you will perform. As I announced earlier, there won't be any new insights, so feel free to stop reading.

Let's begin with the hypothetical choice of plans. Here is the matrix, one last time.

pred-1b pred-2b
1b & bet-acc $100+$25 = $125$0-$75 =$-75
1b & bet-inacc $100-$25 = $75$0+$75 =$75
2b & bet-acc $100-$75 = $25$0+$25 =$25
2b & bet-inacc $100+$75 = $175$0-$25 =$-25

If the predictor can foresee the resolution of indecision, the previous solution – indecision between rows 2 and 4 – is no longer an acceptable solution. If you are in that state of indecision, you can be confident that you'll lose the bet in stage 2. You'll end up with either $75 or$-25, with equal probability. The state is therefore worth $25. Both of the pure decisions of which the state is a mixture then look better. But these pure options aren't stable equilibria either. The only equilibrium in the new version of the story is perfect indecision between rows 1 and 3. Given what you know about the predictor's powers, you can here be confident that she will have foreseen the resolution of the indecision and therefore that you will win the bet. The state is worth$75. No pure option looks better.

Next, let's look at the two actual decision situations, starting with stage 2. This is easy. You should accept the bet, no matter what happened in stage 1.

The decision problem in stage 1 has the same three equilibria as before.

One, you could decide to one-box. You can then expect to get $125, and you'll think that two-boxing would have gotten you$25.

Two, you could decide to two-box. You can then expect to get $25, and you'll think that one-boxing would have cost you$75.

Three, you could remain undecided between one-boxing and two-boxing. You should then think that you'll get either $125 or$25, with equal probability. The state is worth \$75. No pure option is better.

Notice that, as in the first version, one of the three equilibria corresponds to the solution to the combined problem. If you go with the third equilibrium in stage 1 and accept the bet in stage 2 then you're undecided between one-boxing and two-boxing and confident that you'll take the bet – which looks like the same attitude that you would have if you faced the choice of plans.

Best-equilibrium CDT still favours the first equilibrium. So we still have a mismatch between the "planning perspective" and the "implementation perspective". During implementation, you may rationally decide to chose 2b in stage 1 and bet-acc in stage 2, but the only solution to the planning problem is indecision between performing these acts and 1b & bet-acc. The puzzle is less striking than before because the mismatched combinations are very similar. Both involve accepting the bet in stage 2.

Schwarz, Wolfgang. 2015. “Lost Memories and Useless Coins: Revisiting the Absentminded Driver.” Synthese 192 (9): 3011–36.
Skyrms, Brian. 1990. The Dynamics of Rational Deliberation. Cambridge (Mass.): Harvard University Press.