More on dynamic consistency in CDT
One might intuit that any rationally choosable plan should be rationally implementable. In the previous post, I discussed a scenario in which some forms of CDT violate that principle. In this post, I have some more thoughts on how this can happen. I also consider some nearby principles and look at the conditions under which they might hold.
Plans and implementations
Throughout this post I'll assume that we are dealing with ideally rational agents with stable basic desires. We're interested in the attitudes such agents should take towards their options in simple, finite sequential choice situations where no relevant information about the world arrives in between the choice points.
In this context, a plan is a proposition specifying an act for each choice point in the sequence. A plan is rationally choosable if it is a rational choice in a hypothetical decision problem in which the options are the possible plans. A plan is rationally implementable if at each choice point, the agent could rationally choose whatever the plan says she does at that point.
In the previous post, I considered the following principle. (This is the left-to-right direction of the principle I there called "Dynamic Consistency".)
(DC1) If a plan is rationally choosable then it is rationally implementable.
We found that an attractive form of CDT violates (DC1) in the following variant of a scenario from Ahmed (2014).
Newcomb Insurance With A Coin.
Stage 1. You face Newcomb's Problem, but with different monetary values. The transparent box is empty, the opaque box contains $100 iff you have been predicted to one-box. In addition to one-boxing and two-boxing, you have the option to toss a fair coin and let the outcome decide whether you'll one-box or two-box. The predictor can infallibly foresee your choice, but she can't foresee the outcome of the coin toss.
Stage 2. Before the content of the opaque box is revealed, you must bet on whether the predictor foresaw how many boxes you took. If you bet that the prediction was accurate, you get $25 if you're right and lose $75 if you're wrong; if you bet that the prediction was inaccurate, you get $75 if you're right and lose $25 if you're wrong.
Here is a decision matrix for the possible plans.
|1b & bet-acc||$100+$25 = $125||$0-$75 = $-75|
|1b & bet-inacc||$100-$25 = $75||$0+$75 = $75|
|2b & bet-acc||$100-$75 = $25||$0+$25 = $25|
|2b & bet-inacc||$100+$75 = $175||$0-$25 = $-25|
|rand & bet-acc||$125 if 1b else $25 = $75||$-75 if 1b else $25 = $-25|
|rand & bet-inacc||$75 if 1b else $175 = $125||$75 if 1b else $-25 = $25|
The only rationally choosable plan is row 6. But when you face the individual choices, you should arguably one-box in stage 1 and bet on an accurate prediction in stage 2 – because that's the best equilibrium solution. We have a counterexample to (DC1).
A curious aspect of Newcomb Insurance With A Coin is that you do better if you face the individual choices than if you choose a plan. One might have thought that more control is always better. Not so here. If in stage 1, you had the power to "bind" your future choices, you should not make use of that power.
The reason why you do better if you face the individual choices is that you are then given better options. You can rationally intend to one-box, and if you intend to one-box then the opaque box is certain to contain $100. If you have simultaneous control over both choices, by contrast, you can't rationally intend to one-box. The only option you can rationally intend to choose is randomisation. In that case there's a 50% chance that the opaque box contains $100, but also a 50% chance that it contains nothing.
In Newcomb's original problem, the predictor gives EDTers the better options. She gives them a choice between $1M and $1M1K, while CDTers get a choice between $0 and $1K. In Newcomb Insurance With A Coin, the predictor gives better options not just to EDTers, but also to (best-equilibrium) CDTers who face the two choices independently. CDTers who have simultaneous control over both choices are punished with worse options.
If we hold fixed the content of the opaque box then the optimal plan (rand & bet-inacc) is no worse than the optimal implementation (1b & bet-acc). Suppose the opaque box contains $100. Then your expected payoff is $125 either way. Suppose the opaque box is empty. Then rand & bet-inacc has expected payoff $25 while 1b & bet-acc is guaranteed to result in $-75.
So we have two factors that pull in opposite directions. On the one hand, having simultaneous control over both choices allows you to make more (or at least equally much) out of whatever cards you've been dealt. On the other hand, the extra control makes it likely that you've been dealt worse cards.
This kind of situation is clearly unusual. When we intuit that a rationally choosable plan should be rationally implementable, we don't suppose that having a choice between plans is associated with having worse options.
Non-equilibrium Dynamic Consistency?
The dynamic consistency principle (DC1) compares a hypothetical choice between plans with the actual choices at the individual choice points. Perhaps we should not take the merely hypothetical choice so seriously. That is, perhaps we shouldn't consider what you should do if you could actually choose between plans – with the possible consequence that you would then have been given worse options. Instead, we might simply consider which plans maximise expected utility, without considering whether you could rationally decide in their favour.
(DC2) If a plan maximises expected utility then each of its acts maximises expected utility after the earlier acts have been performed.
From a CDT perspective, however, (DC2) is highly implausible.
The problem is that a plan can maximize (causal) expected utility only if you believe that you won't choose it. In Newcomb Insurance With A Coin, for example, the plan 2b & bet-inacc maximises expected utility if you believe that you'll choose 1b in stage 1. But if you go ahead and implement 2b & bet-inacc, you can hardly remain confident that you choose 1b in stage 1. After having chosen 2b, you know that you have chosen 2b, and then 2b & bet-inacc no longer maximises expected utility. (Nor does bet-inacc on its own.)
Plans and continuations
The most popular formulation of dynamic consistency in the literature goes like this.
(DC3) If a plan is choosable at the start of a sequential choice problem, then its continuation is choosable at any later point after the earlier parts of the plan have been implemented.
Newcomb Insurance With A Coin is a counterexample to (DC1), but not to (DC3). Is (DC3) valid in CDT? It depends.
I'll first prove a lemma.
Lemma. If a plan P maximises (causal) expected utility conditional on P then the plan's continuation still maximises (causal) expected utility conditional on P.
Proof. I assume that causal expected utility can be expressed in terms of some kind of supposition, so that EU(A) = ∑w V(w)Cr(w//A), where Cr(B//A) is the agent's credence in B on the supposition that A. Cr(B//A) must be distinguished from the ordinary conditional probability Cr(B/A), which I also write as CrA(B). I need two assumptions about the relevant kind of supposition. Both should be fairly uncontroversial.
No-Backtracking. If A says that such-and-such acts are performed up to some point in a sequential choice situation, and B says that such-and-such acts are performed afterwards, then CrA(A//B) = 1.
Similarity. If Cr(A//B) = 1 then Cr(C//B) = Cr(C//A ∧ B).
Now assume some plan P=A1…An maximises (causal) expected utility conditional on A1…An, compared to any other plan. In particular,
for any acts Bi…Bn available at points i…n respectively. By (No-Backtracking),
for any Bi…Bn. By Similarity, it follows that
for any Bi…Bn and world w. Plugging this into the first inequality (once with Ai…An as Bi…Bn and once with Bi…Bn as Bi…Bn, we get
for any Bi…Bn. QED.
Any sensible form of CDT should hold that an option is choosable only if it maximizes causal expected utility conditional on being chosen. Let permissive CDT be the view that this condition is not only necessary, but also sufficient for choosability.
Observation 1. Permissive CDT validates (DC3).
Proof. Let Cri be the agent's credence at point i. Since Cri(*) = Cr1(*/A1…Ai-1) and the value function is stable, we can replace 'CrA1…An' by 'CriAi…An' in the Lemma's result:
for any Bi…Bn. QED.
Let best-equilibrium CDT be the view that one may only choose a best among the options that maximise causal expected utility conditional on being chosen, where the relevant measure of goodness is each candidate's expected utility conditional on being chosen.
Observation 2. Best-equilibrium CDT does not validate (DC3).
Here is a counterexample.
Two Buttons. In stage 1, you can choose whether to press a button. In stage 2, you can choose whether to press a different button. A predictor has predicted your choice in both stages. If she predicted that you'd press only the first button, she wired the buttons so that you get $15 iff you press neither button and $12 otherwise. If she predicted that you'd do anything else, she wired the buttons so that you get $10 if you press both buttons and $0 otherwise. You are certain that the predictor has foreseen your choice.
The payoff matrix for your plans looks as follows, where 'P1N2' means 'press button 1 and not button 2'.
The only equilibrium in this decision problem is P1P2. After you've pushed the first button (P1) in stage 1, your decision problem in stage 2 is effectively the top left quarter of the matrix. This problem has two equilibria, P1P2 and P1N2. The second is better. Best-equilibrium CDT therefore says that in stage 2, the continuation of the only ex ante choosable plan P1P2 is no longer choosable.
Implementing a plan
I don't understand the common focus on (DC3). The principle compares ex ante attitudes towards a plan with attitudes towards the plan during its hypothetical implementation, even if that implementation is irrational. If it would be irrational to implement a plan, why should we assume that you should have stable attitudes towards the plan during its implementation? To be sure, if we could show (DC1) – if we could show that any choosable plan is rationally implementable – then (DC3) would have some appeal. But then (DC1) is doing the real work.
Newcomb Insurance With A Coin shows that (DC1) is invalid in best-equilibrium CDT. Permissive CDT escapes the counterexample. It allows you to implement the uniquely choosable plan. Is this always true? That is, can we show the following?
(DC4) If a plan P maximises expected utility conditional on P then each of its acts maximises expected utility conditional on P after the earlier acts have been chosen.
(DC4) resembles (DC2), except that we've conditionalised on P. This ensures that we don't consider the relevant plan as a merely counterfactual alternative, which rendered (DC2) untenable.
(DC4) looks plausible to me. Oddly, I can't prove it without some non-trivial assumptions. The following two assumptions do the job.
Future Determinacy. You are never uncertain about what your future self would choose at later points in the sequence under the supposition that you make a certain choice now.
Strong Centring. If you are certain that you will choose A1 now and A2…An afterwards then you are certain that you would choose A2….An on the supposition that you now choose A1.
Strong Centring is debatable. Future Determinacy is not plausible as a general assumption. But it is often satisfied. If you know that your future self is rational, you can often figure out what they would do if they faced a certain decision situation. The only counterexamples are situations in which you know that your future self would face a choice in which two options are both choosable, and you don't know which of the options they would pick.
Observation 2. (DC4) holds in CDT whenever Future Determinacy and Strong Centring are satisfied.
Proof: Assume A1…An maximises (causal) expected utility conditional on P. By the Lemma, we know that after A1…Ai-1 have been implemented, Ai…An still maximises expected utility conditional on P. That is,
for any Bi…Bn. We need to show that Ai alone also maximises expected utility conditional on P. Suppose for reductio that some alternative Bi has greater expected utility conditional on P. That is,
By Future Determinacy, there are acts Bi+1…Bn such that CriA1…An(Bi+1…Bn//Bi) = 1. By Similarity, it follows that CriA1…An(w//Bi) = CriA1…An(w//BiBi+1…Bn). Also, by Strong Centring and Similarity, CrPi(w//Ai) = CrPi(w//Ai…An). Thus (2) turns into
And this contradicts (1). QED.
Why do we need Future Determinacy? Consider a two-stage dynamic decision problem, and assume the plan A1 ∧ A2 maximises expected utility conditional on A1 ∧ A2. Now suppose Future Determinacy is false: you don't know would you would choose in stage 2 if you chose B1 in stage 1. Let's say you're unsure whether you would choose A2 or B2, because both would be equally choiceworthy. We know that (conditional on A1 ∧ A2) neither B1 ∧ A2 nor B1 ∧ B2 has greater expected utility than A1 ∧ A2. Oddly, their disjunction – which is equivalent to B1 – might still have greater expected utility than A1 ∧ A2. And then A1 would not maximise expected utility (conditional on A1 ∧ A2).
Instead of Future Determinacy, we could also require that (conditional on A1…An) no disjunction of plan continuations has greater expected utility than each disjunct. Or more specifically: If some plan continuations all have expected utility x then their disjunction does not have expected utility greater than x. Let's call a scenario bizarre if it falsifies both Future Determinacy and this condition. Any counterexample to (DC4) would have to be bizarre (given our other assumptions, like Strong Centring).
Observation 3. (DC1) holds in permissive CDT for any scenario that is not bizarre.
This immediately follows from Observation 2.
Observation 4. Any non-bizarre case in which best-equilibrium CDT violates (DC1) is a case in which simultaneous control over all acts in the relevant sequence is bad news, indicating that the agent has been given worse options.
This is because a better equilibrium in a decision problem is always better because it carries better news about your options. By (DC4), the optimal planning equilibrium is still an equilibrium during implementation. If there is a better equilibrium during implementation this means that the original equilibrium – the planning equilibrium – carries worse news about the options.
It would be good to figure out what happens in "bizarre" cases.
I can't be the first to look into dynamic consistency from a CDT perspective. Any literature suggestions are welcome.