Wolfgang Schwarz :: More on dynamic consistency in CDT

More on dynamic consistency in CDT

Posted on Monday, 27 Sep 2021.

One might intuit that any rationally choosable plan should be rationally implementable. In the previous post, I discussed a scenario in which some forms of CDT violate that principle. In this post, I have some more thoughts on how this can happen. I also consider some nearby principles and look at the conditions under which they might hold.

Plans and implementations

Throughout this post I'll assume that we are dealing with ideally rational agents with stable basic desires. We're interested in the attitudes such agents should take towards their options in simple, finite sequential choice situations where no relevant information about the world arrives in between the choice points.

In this context, a plan is a proposition specifying an act for each choice point in the sequence. A plan is rationally choosable if it is a rational choice in a hypothetical decision problem in which the options are the possible plans. A plan is rationally implementable if at each choice point, the agent could rationally choose whatever the plan says she does at that point.

In the previous post, I considered the following principle. (This is the left-to-right direction of the principle I there called "Dynamic Consistency".)

(DC1) If a plan is rationally choosable then it is rationally implementable.

We found that an attractive form of CDT violates (DC1) in the following variant of a scenario from Ahmed (2014).

Newcomb Insurance With A Coin.

Stage 1. You face Newcomb's Problem, but with different monetary values. The transparent box is empty, the opaque box contains $100 iff you have been predicted to one-box. In addition to one-boxing and two-boxing, you have the option to toss a fair coin and let the outcome decide whether you'll one-box or two-box. The predictor can infallibly foresee your choice, but she can't foresee the outcome of the coin toss.

Stage 2. Before the content of the opaque box is revealed, you must bet on whether the predictor foresaw how many boxes you took. If you bet that the prediction was accurate, you get $25 if you're right and lose $75 if you're wrong; if you bet that the prediction was inaccurate, you get $75 if you're right and lose $25 if you're wrong.

Here is a decision matrix for the possible plans.

	pred-1b	pred-2b
1b & bet-acc	$100+$25 = $125	$0-$75 = $-75
1b & bet-inacc	$100-$25 = $75	$0+$75 = $75
2b & bet-acc	$100-$75 = $25	$0+$25 = $25
2b & bet-inacc	$100+$75 = $175	$0-$25 = $-25
rand & bet-acc	$125 if 1b else $25 = $75	$-75 if 1b else $25 = $-25
rand & bet-inacc	$75 if 1b else $175 = $125	$75 if 1b else $-25 = $25

The only rationally choosable plan is row 6. But when you face the individual choices, you should arguably one-box in stage 1 and bet on an accurate prediction in stage 2 – because that's the best equilibrium solution. We have a counterexample to (DC1).

Better options

A curious aspect of Newcomb Insurance With A Coin is that you do better if you face the individual choices than if you choose a plan. One might have thought that more control is always better. Not so here. If in stage 1, you had the power to "bind" your future choices, you should not make use of that power.

The reason why you do better if you face the individual choices is that you are then given better options. You can rationally intend to one-box, and if you intend to one-box then the opaque box is certain to contain $100. If you have simultaneous control over both choices, by contrast, you can't rationally intend to one-box. The only option you can rationally intend to choose is randomisation. In that case there's a 50% chance that the opaque box contains $100, but also a 50% chance that it contains nothing.

In Newcomb's original problem, the predictor gives EDTers the better options. She gives them a choice between $1M and $1M1K, while CDTers get a choice between $0 and $1K. In Newcomb Insurance With A Coin, the predictor gives better options not just to EDTers, but also to (best-equilibrium) CDTers who face the two choices independently. CDTers who have simultaneous control over both choices are punished with worse options.

If we hold fixed the content of the opaque box then the optimal plan (rand & bet-inacc) is no worse than the optimal implementation (1b & bet-acc). Suppose the opaque box contains $100. Then your expected payoff is $125 either way. Suppose the opaque box is empty. Then rand & bet-inacc has expected payoff $25 while 1b & bet-acc is guaranteed to result in $-75.

So we have two factors that pull in opposite directions. On the one hand, having simultaneous control over both choices allows you to make more (or at least equally much) out of whatever cards you've been dealt. On the other hand, the extra control makes it likely that you've been dealt worse cards.

This kind of situation is clearly unusual. When we intuit that a rationally choosable plan should be rationally implementable, we don't suppose that having a choice between plans is associated with having worse options.

Non-equilibrium Dynamic Consistency?

The dynamic consistency principle (DC1) compares a hypothetical choice between plans with the actual choices at the individual choice points. Perhaps we should not take the merely hypothetical choice so seriously. That is, perhaps we shouldn't consider what you should do if you could actually choose between plans – with the possible consequence that you would then have been given worse options. Instead, we might simply consider which plans maximise expected utility, without considering whether you could rationally decide in their favour.

(DC2) If a plan maximises expected utility then each of its acts maximises expected utility after the earlier acts have been performed.

From a CDT perspective, however, (DC2) is highly implausible.

The problem is that a plan can maximize (causal) expected utility only if you believe that you won't choose it. In Newcomb Insurance With A Coin, for example, the plan 2b & bet-inacc maximises expected utility if you believe that you'll choose 1b in stage 1. But if you go ahead and implement 2b & bet-inacc, you can hardly remain confident that you choose 1b in stage 1. After having chosen 2b, you know that you have chosen 2b, and then 2b & bet-inacc no longer maximises expected utility. (Nor does bet-inacc on its own.)

Plans and continuations

The most popular formulation of dynamic consistency in the literature goes like this.

(DC3) If a plan is choosable at the start of a sequential choice problem, then its continuation is choosable at any later point after the earlier parts of the plan have been implemented.

Newcomb Insurance With A Coin is a counterexample to (DC1), but not to (DC3). Is (DC3) valid in CDT? It depends.

I'll first prove a lemma.

Lemma. If a plan P maximises (causal) expected utility conditional on P then the plan's continuation still maximises (causal) expected utility conditional on P.

Proof. I assume that causal expected utility can be expressed in terms of some kind of supposition, so that EU(A) = ∑_w V(w)Cr(w//A), where Cr(B//A) is the agent's credence in B on the supposition that A. Cr(B//A) must be distinguished from the ordinary conditional probability Cr(B/A), which I also write as Cr^A(B). I need two assumptions about the relevant kind of supposition. Both should be fairly uncontroversial.

No-Backtracking. If A says that such-and-such acts are performed up to some point in a sequential choice situation, and B says that such-and-such acts are performed afterwards, then Cr^A(A//B) = 1.

Similarity. If Cr(A//B) = 1 then Cr(C//B) = Cr(C//A ∧ B).

Now assume some plan P=A₁…A_n maximises (causal) expected utility conditional on A₁…A_n, compared to any other plan. In particular,

\sum w V (w) C r A 1 ... A n (w // A 1 .. A i .. A n) \geq \sum w V (w) C r A 1 ... A n (w // A 1 .. B i .. B n)

for any acts B_i…B_n available at points i…n respectively. By (No-Backtracking),

C r A 1 ... A n (A 1 ... A i - 1 // B i ... B n) = 1

for any B_i…B_n. By Similarity, it follows that

C r A 1 ... A n (w // B i ... B n) = C r A 1 ... A n (w // A 1 ... A i - 1 B i ... B n)

for any B_i…B_n and world w. Plugging this into the first inequality (once with A_i…A_n as B_i…B_n and once with B_i…B_n as B_i…B_n, we get

\sum w V (w) C r A 1 ... A n (w // A i .. A n) \geq \sum w V (w) C r A 1 ... A n (w // B i .. B n)

for any B_i…B_n. QED.

Any sensible form of CDT should hold that an option is choosable only if it maximizes causal expected utility conditional on being chosen. Let permissive CDT be the view that this condition is not only necessary, but also sufficient for choosability.

Observation 1. Permissive CDT validates (DC3).

Proof. Let Cr_i be the agent's credence at point i. Since Cr_i(*) = Cr₁(*/A₁…A_i-1) and the value function is stable, we can replace 'Cr^A₁…A_n' by 'Cr_i^A_i…A_n' in the Lemma's result:

\sum w V (w) C r i A i ... A n (w // A i .. A n) \geq \sum w V (w) C r i A i ... A n (w // B i .. B n)

for any B_i…B_n. QED.

Let best-equilibrium CDT be the view that one may only choose a best among the options that maximise causal expected utility conditional on being chosen, where the relevant measure of goodness is each candidate's expected utility conditional on being chosen.

Observation 2. Best-equilibrium CDT does not validate (DC3).

Here is a counterexample.

Two Buttons. In stage 1, you can choose whether to press a button. In stage 2, you can choose whether to press a different button. A predictor has predicted your choice in both stages. If she predicted that you'd press only the first button, she wired the buttons so that you get $15 iff you press neither button and $12 otherwise. If she predicted that you'd do anything else, she wired the buttons so that you get $10 if you press both buttons and $0 otherwise. You are certain that the predictor has foreseen your choice.

The payoff matrix for your plans looks as follows, where 'P1N2' means 'press button 1 and not button 2'.

	Pred-P1P2	Pred-P1N2	Pred-N1P2	Pred-N1N2
P1P2	$10	$12	$10	$10
P1N2	$0	$12	$0	$0
N1P2	$0	$12	$0	$0
N1N2	$0	$15	$0	$0

The only equilibrium in this decision problem is P1P2. After you've pushed the first button (P1) in stage 1, your decision problem in stage 2 is effectively the top left quarter of the matrix. This problem has two equilibria, P1P2 and P1N2. The second is better. Best-equilibrium CDT therefore says that in stage 2, the continuation of the only ex ante choosable plan P1P2 is no longer choosable.

Implementing a plan

I don't understand the common focus on (DC3). The principle compares ex ante attitudes towards a plan with attitudes towards the plan during its hypothetical implementation, even if that implementation is irrational. If it would be irrational to implement a plan, why should we assume that you should have stable attitudes towards the plan during its implementation? To be sure, if we could show (DC1) – if we could show that any choosable plan is rationally implementable – then (DC3) would have some appeal. But then (DC1) is doing the real work.

Newcomb Insurance With A Coin shows that (DC1) is invalid in best-equilibrium CDT. Permissive CDT escapes the counterexample. It allows you to implement the uniquely choosable plan. Is this always true? That is, can we show the following?

(DC4) If a plan P maximises expected utility conditional on P then each of its acts maximises expected utility conditional on P after the earlier acts have been chosen.

(DC4) resembles (DC2), except that we've conditionalised on P. This ensures that we don't consider the relevant plan as a merely counterfactual alternative, which rendered (DC2) untenable.

(DC4) looks plausible to me. Oddly, I can't prove it without some non-trivial assumptions. The following two assumptions do the job.

Future Determinacy. You are never uncertain about what your future self would choose at later points in the sequence under the supposition that you make a certain choice now.

Strong Centring. If you are certain that you will choose A₁ now and A₂…A_n afterwards then you are certain that you would choose A₂….A_n on the supposition that you now choose A₁.

Strong Centring is debatable. Future Determinacy is not plausible as a general assumption. But it is often satisfied. If you know that your future self is rational, you can often figure out what they would do if they faced a certain decision situation. The only counterexamples are situations in which you know that your future self would face a choice in which two options are both choosable, and you don't know which of the options they would pick.

Observation 2. (DC4) holds in CDT whenever Future Determinacy and Strong Centring are satisfied.

Proof: Assume A₁…A_n maximises (causal) expected utility conditional on P. By the Lemma, we know that after A₁…A_i-1 have been implemented, A_i…A_n still maximises expected utility conditional on P. That is,

(1)\sum w V (w) C r i A 1 .. A n (w // A i .. A n) \geq \sum w V (w) C r i A 1 ... A n (w // B i .. B n)

for any B_i…B_n. We need to show that A_i alone also maximises expected utility conditional on P. Suppose for reductio that some alternative B_i has greater expected utility conditional on P. That is,

(2)\sum w V (w) C r i A 1 .. A n (w // B i) > \sum w V (w) C r i A 1 ... A n (w // A i).

By Future Determinacy, there are acts B_i+1…B_n such that Cr_i^A₁…A_n(B_i+1…B_n//B_i) = 1. By Similarity, it follows that Cr_i^A₁…A_n(w//B_i) = Cr_i^A₁…A_n(w//B_iB_i+1…B_n). Also, by Strong Centring and Similarity, Cr^P_i(w//A_i) = Cr^P_i(w//A_i…A_n). Thus (2) turns into

(3)\sum w V (w) C r i A 1 .. A n (w // B i ... B n) > \sum w V (w) C r i A 1 ... A n (w // A i ... A n).

And this contradicts (1). QED.

Why do we need Future Determinacy? Consider a two-stage dynamic decision problem, and assume the plan A₁ ∧ A₂ maximises expected utility conditional on A₁ ∧ A₂. Now suppose Future Determinacy is false: you don't know would you would choose in stage 2 if you chose B₁ in stage 1. Let's say you're unsure whether you would choose A₂ or B₂, because both would be equally choiceworthy. We know that (conditional on A₁ ∧ A₂) neither B₁ ∧ A₂ nor B₁ ∧ B₂ has greater expected utility than A₁ ∧ A₂. Oddly, their disjunction – which is equivalent to B₁ – might still have greater expected utility than A₁ ∧ A₂. And then A₁ would not maximise expected utility (conditional on A₁ ∧ A₂).

Instead of Future Determinacy, we could also require that (conditional on A₁…A_n) no disjunction of plan continuations has greater expected utility than each disjunct. Or more specifically: If some plan continuations all have expected utility x then their disjunction does not have expected utility greater than x. Let's call a scenario bizarre if it falsifies both Future Determinacy and this condition. Any counterexample to (DC4) would have to be bizarre (given our other assumptions, like Strong Centring).

Observation 3. (DC1) holds in permissive CDT for any scenario that is not bizarre.

This immediately follows from Observation 2.

Observation 4. Any non-bizarre case in which best-equilibrium CDT violates (DC1) is a case in which simultaneous control over all acts in the relevant sequence is bad news, indicating that the agent has been given worse options.

This is because a better equilibrium in a decision problem is always better because it carries better news about your options. By (DC4), the optimal planning equilibrium is still an equilibrium during implementation. If there is a better equilibrium during implementation this means that the original equilibrium – the planning equilibrium – carries worse news about the options.

It would be good to figure out what happens in "bizarre" cases.

I can't be the first to look into dynamic consistency from a CDT perspective. Any literature suggestions are welcome.

Ahmed, Arif. 2014. Evidence, Decision and Causality. Cambridge University Press.

Comments

# Dmitri Gallow on 27 September 2021, 18:04

Thanks for these posts. I've found them v useful, but I'm getting myself a bit confused with this one. I've tried to explain my confusion below---any help would be appreciated.

-------

I'm not sure whether you think that, in cases of decision instability, where no pure option is causally ratifiable, any option given positive probability in equilibrium is choosable, or whether you think that in these cases, no option is choosable, or whether you think that the 'mixed' act is an option which is choosable. I'm going to suppose that it's the first one, where any 'live' option in equilibrium is rationally permissible (but maybe that's where I'm going wrong).

If that's right, then I'd have thought that both (DC3) and (DC4) are violated by CDT in the sequential decision I talk about in section 3.3 of this paper (https://philpapers.org/rec/GALETC-3).

Here's the decision: at stage one, you choose between three boxes, call them A, B, and C. At stage two, you're given the option of keeping the box you chose at stage one, or else paying $60 to exchange it for its 'successor' in the sequence (A, B, C). So, if you chose A at stage one, then at stage two you may either keep A for free or pay $60 to exchange A for B. If you chose B at stage one, then at stage two you may either keep B for free or pay $60 to exchange B for C. If you chose C at stage one, then at stage two you may either keep C for free or pay $60 to exchange C for A.

If a reliable predictor predicted that you would end up with box A after stage 2, then they put nothing in A, $100 in B, and a bill for $100 in C. If they predicted that you would end up with box B after stage 2, then they put nothing in B, $100 in C, and a bill for $100 in A. If they predicted that you would end up with box C after stage 2, then they put nothing in C, $100 in A, and a bill for $100 in B. To keep things simple, suppose that, conditional on you ending up with box X, you are 100% sure that you were predicted to end up with box X.

There are six possible plans: you could either choose A and stick with it (AA), choose A and switch to B (AB), choose B and stick with B (BB), choose B and switch to C (BC), choose C and stick with it (CC), or choose C and switch to A (CA).

I'm supposing that it's permissible to choose *some* plan at the start of this sequential choice problem. Whichever plan is permissible to choose, it's definitely not AB, BC, or CA. These plans are causally dominated by BB, CC, and AA, respectively. So at least one of the 'not switching' plans is permissible, or perhaps its permissible to choose each of them with a 1/3rd probability. But only the 'switching' plans are implementable. It doesn't matter which box you choose at stage one. At stage two, if "x" is your credence that you'll stick to your first choice, then the utility of sticking with your first choice will be 100x - 100, and the utility of switching will be 100x - 60. Since the latter is higher, no matter the value of "x", CDT will say that you are required to pay the $60 to switch at stage two.

But then, it looks like CDT violates (DC3), since some 'non-switching' plan (or some mixture thereof?) is permissible at the start of this sequential choice problem, but the continuation of those plans is not permissible at stage two. Moreover, it looks to me like the 'mixed' plan which, at stage one, selects each box with 1/3rd probability and then doesn't switch at stage two maximises expected utility conditional on it being chosen. However, at stage two, not switching doesn't maximise expected utility conditional on any 'not switching' plan being chosen. So it looks to me like this is a decision in which CDT violates (DC4) as well.

I think that the decision also shows that CDT doesn't satisfy the following, even weaker principle:

(DC5) There is some plan such that: 1) it is permissible to choose that plan; and 2) it is permissible to implement that plan, after the earlier acts have been chosen.

But, like I said, I think I'm missing something here.

# wo on 27 September 2021, 20:15

Thanks Dmitri, that's very interesting. I need to read your paper and think about it more carefully. A few initial comments/clarifications:

I'm drawing a distinction between choosing an act and performing the act. Choosing involves forming an intention to perform the act. According to the CDT I'm defending here, you can't rationally choose an act unless it is ratifiable. If there's no ratifiable act, you must remain undecided. This state of indecision will have to be resolved somehow, but it can't be resolved rationally by making a choice. Perhaps some subpersonal process will select an act. You then end up performing the act without choosing it.

In your example, the only rational response to the planning problem is perfect indecision between AA, BB, and CC. No plan is "choosable" in my sense. I don't consider a state of indecision to be choosable either. It's not an option. But we could add a further option to randomise your choice. This option would be choosable.

In the decision problem of stage 1 you should also remain undecided, or randomise if that's an option. If the predictor can't foresee how the indecision (or the randomisation) is resolved then I think not switching is actually the right response in stage 2. (All three predictions are then equally likely.)

Things get interesting if the predictor can foresee how indecision is resolved, as I guess your scenario assumes. Then I think you're right that you have to switch in stage 2. In the version of the story with randomisation as an option, this comes close to a violation of (DC3) and (DC4). But it may not satisfy the preconditions of these claims: that you're in a simple sequential choice problem in which no relevant information arrives in between the choice points. If you randomise in stage 1 and don't learn about the outcome then I think you should not switch in stage 2. If you randomise and do learn about the outcome, you're not in the relevant kind of decision problem.

But the case is still worrying. It indicates that something might go wrong once we add information nodes. You also seem to violate Preference Reflection: if you don't know the outcome of the randomisation (yet) then you prefer to switch conditional on every possible outcome, but unconditionally you prefer not to switch. Or so it currently seems to me. But I need to work through all this more slowly.

(Sorry about the broken 'Edit Comment' function. I think I fixed it.)

# Dmitri Gallow on 28 September 2021, 02:25

Thanks, I think I see things more clearly now.

In some decisions, no option is rational to choose. But presumably there is still a distinction between options which it is permissible to perform and those which it is impermissible to perform? The ones with positive probability in the best equilibrium are permissible to perform (though *not* permissible to choose), and the ones which zero probability in the best equilibrium are impermissible to perform?

Can I call an option "permissible" (full stop) iff it is either permissible to choose or permissible to perform? Then, I think we agree that the following principle is false (given CDT):

(DC3*) If a plan is permissible at the start of a sequential choice problem, then its continuation is permissible at any later point after the earlier parts of the plan have been implemented.

--------

I was also confused about the prohibition on new information. I hadn't taken that to include information about how you choose (or perform) at stage one. I think I'd expected a weaker caveat on the principles, one which says: "over the course of the sequential decision, you shouldn't learn anything unexpected". Changes in your de se beliefs are expected, so they don't violate the caveat; and it's expected that you'll learn how you choose as you progress through the sequential decision, so that wouldn't violate the caveat as I was understanding it, either.

# wo on 28 September 2021, 08:31

Right. I'm not sure yet about (DC3*). It appears to fail in your example.

I like how you express the prohibition on new information. That's a good way of thinking about it. This condition is violated if you randomise. If you toss a coin in order to decide what to do, the result of the coin toss, and therefore your later act, is not something you could have foreseen. The same happens if you rationally perform an act that isn't rationally choosable: you can't foresee how the subpersonal process will resolve your state of indecision. But I agree it's odd that this information makes such a difference.

# wo on 28 September 2021, 11:51

PS: I just wrote up the next post on Arif's book, discussing his "Psycho Insurance" case. Here (DC3*) fails as well, as I mention towards the end of the post. Still need to think through your scenario.

Wolfgang Schwarz