More on dynamic consistency in CDT
One might intuit that any rationally choosable plan should be rationally implementable. In the previous post, I discussed a scenario in which some forms of CDT violate that principle. In this post, I have some more thoughts on how this can happen. I also consider some nearby principles and look at the conditions under which they might hold.
Plans and implementations
Throughout this post I'll assume that we are dealing with ideally rational agents with stable basic desires. We're interested in the attitudes such agents should take towards their options in simple, finite sequential choice situations where no relevant information about the world arrives in between the choice points.
In this context, a plan is a proposition specifying an act for each choice point in the sequence. A plan is rationally choosable if it is a rational choice in a hypothetical decision problem in which the options are the possible plans. A plan is rationally implementable if at each choice point, the agent could rationally choose whatever the plan says she does at that point.
In the previous post, I considered the following principle. (This is the left-to-right direction of the principle I there called "Dynamic Consistency".)
(DC1) If a plan is rationally choosable then it is rationally implementable.
We found that an attractive form of CDT violates (DC1) in the following variant of a scenario from Ahmed (2014).
Newcomb Insurance With A Coin.
Stage 1. You face Newcomb's Problem, but with different monetary values. The transparent box is empty, the opaque box contains $100 iff you have been predicted to one-box. In addition to one-boxing and two-boxing, you have the option to toss a fair coin and let the outcome decide whether you'll one-box or two-box. The predictor can infallibly foresee your choice, but she can't foresee the outcome of the coin toss.
Stage 2. Before the content of the opaque box is revealed, you must bet on whether the predictor foresaw how many boxes you took. If you bet that the prediction was accurate, you get $25 if you're right and lose $75 if you're wrong; if you bet that the prediction was inaccurate, you get $75 if you're right and lose $25 if you're wrong.
Here is a decision matrix for the possible plans.
pred-1b | pred-2b | |
---|---|---|
1b & bet-acc | $100+$25 = $125 | $0-$75 = $-75 |
1b & bet-inacc | $100-$25 = $75 | $0+$75 = $75 |
2b & bet-acc | $100-$75 = $25 | $0+$25 = $25 |
2b & bet-inacc | $100+$75 = $175 | $0-$25 = $-25 |
rand & bet-acc | $125 if 1b else $25 = $75 | $-75 if 1b else $25 = $-25 |
rand & bet-inacc | $75 if 1b else $175 = $125 | $75 if 1b else $-25 = $25 |
The only rationally choosable plan is row 6. But when you face the individual choices, you should arguably one-box in stage 1 and bet on an accurate prediction in stage 2 – because that's the best equilibrium solution. We have a counterexample to (DC1).
Better options
A curious aspect of Newcomb Insurance With A Coin is that you do better if you face the individual choices than if you choose a plan. One might have thought that more control is always better. Not so here. If in stage 1, you had the power to "bind" your future choices, you should not make use of that power.
The reason why you do better if you face the individual choices is that you are then given better options. You can rationally intend to one-box, and if you intend to one-box then the opaque box is certain to contain $100. If you have simultaneous control over both choices, by contrast, you can't rationally intend to one-box. The only option you can rationally intend to choose is randomisation. In that case there's a 50% chance that the opaque box contains $100, but also a 50% chance that it contains nothing.
In Newcomb's original problem, the predictor gives EDTers the better options. She gives them a choice between $1M and $1M1K, while CDTers get a choice between $0 and $1K. In Newcomb Insurance With A Coin, the predictor gives better options not just to EDTers, but also to (best-equilibrium) CDTers who face the two choices independently. CDTers who have simultaneous control over both choices are punished with worse options.
If we hold fixed the content of the opaque box then the optimal plan (rand & bet-inacc) is no worse than the optimal implementation (1b & bet-acc). Suppose the opaque box contains $100. Then your expected payoff is $125 either way. Suppose the opaque box is empty. Then rand & bet-inacc has expected payoff $25 while 1b & bet-acc is guaranteed to result in $-75.
So we have two factors that pull in opposite directions. On the one hand, having simultaneous control over both choices allows you to make more (or at least equally much) out of whatever cards you've been dealt. On the other hand, the extra control makes it likely that you've been dealt worse cards.
This kind of situation is clearly unusual. When we intuit that a rationally choosable plan should be rationally implementable, we don't suppose that having a choice between plans is associated with having worse options.
Non-equilibrium Dynamic Consistency?
The dynamic consistency principle (DC1) compares a hypothetical choice between plans with the actual choices at the individual choice points. Perhaps we should not take the merely hypothetical choice so seriously. That is, perhaps we shouldn't consider what you should do if you could actually choose between plans – with the possible consequence that you would then have been given worse options. Instead, we might simply consider which plans maximise expected utility, without considering whether you could rationally decide in their favour.
(DC2) If a plan maximises expected utility then each of its acts maximises expected utility after the earlier acts have been performed.
From a CDT perspective, however, (DC2) is highly implausible.
The problem is that a plan can maximize (causal) expected utility only if you believe that you won't choose it. In Newcomb Insurance With A Coin, for example, the plan 2b & bet-inacc maximises expected utility if you believe that you'll choose 1b in stage 1. But if you go ahead and implement 2b & bet-inacc, you can hardly remain confident that you choose 1b in stage 1. After having chosen 2b, you know that you have chosen 2b, and then 2b & bet-inacc no longer maximises expected utility. (Nor does bet-inacc on its own.)
Plans and continuations
The most popular formulation of dynamic consistency in the literature goes like this.
(DC3) If a plan is choosable at the start of a sequential choice problem, then its continuation is choosable at any later point after the earlier parts of the plan have been implemented.
Newcomb Insurance With A Coin is a counterexample to (DC1), but not to (DC3). Is (DC3) valid in CDT? It depends.
I'll first prove a lemma.
Lemma. If a plan P maximises (causal) expected utility conditional on P then the plan's continuation still maximises (causal) expected utility conditional on P.
Proof. I assume that causal expected utility can be expressed in terms of some kind of supposition, so that EU(A) = ∑_{w} V(w)Cr(w//A), where Cr(B//A) is the agent's credence in B on the supposition that A. Cr(B//A) must be distinguished from the ordinary conditional probability Cr(B/A), which I also write as Cr^{A}(^{}B). I need two assumptions about the relevant kind of supposition. Both should be fairly uncontroversial.
No-Backtracking. If A says that such-and-such acts are performed up to some point in a sequential choice situation, and B says that such-and-such acts are performed afterwards, then Cr^{A}_{}(A_{}//B_{}) = 1.
Similarity. If Cr(A//B) = 1 then Cr(C//B) = Cr(C//A ∧ B).
Now assume some plan P=A_{1}…A_{n} maximises (causal) expected utility conditional on A_{1}…A_{n}, compared to any other plan. In particular,
for any acts B_{i}…B_{n} available at points i…n respectively. By (No-Backtracking),
for any B_{i}…B_{n}. By Similarity, it follows that
for any B_{i}…B_{n} and world w. Plugging this into the first inequality (once with A_{i}…A_{n} as B_{i}…B_{n} and once with B_{i}…B_{n} as B_{i}…B_{n}, we get
for any B_{i}…B_{n}. QED.
Any sensible form of CDT should hold that an option is choosable only if it maximizes causal expected utility conditional on being chosen. Let permissive CDT be the view that this condition is not only necessary, but also sufficient for choosability.
Observation 1. Permissive CDT validates (DC3).
Proof. Let Cr_{i} be the agent's credence at point i. Since Cr_{i}(*) = Cr_{1}(*/A_{1}…A_{i-1}) and the value function is stable, we can replace 'Cr^{A1…An}' by 'Cr_{i}^{Ai…An}' in the Lemma's result:
for any B_{i}…B_{n}. QED.
Let best-equilibrium CDT be the view that one may only choose a best among the options that maximise causal expected utility conditional on being chosen, where the relevant measure of goodness is each candidate's expected utility conditional on being chosen.
Observation 2. Best-equilibrium CDT does not validate (DC3).
Here is a counterexample.
Two Buttons. In stage 1, you can choose whether to press a button. In stage 2, you can choose whether to press a different button. A predictor has predicted your choice in both stages. If she predicted that you'd press only the first button, she wired the buttons so that you get $15 iff you press neither button and $12 otherwise. If she predicted that you'd do anything else, she wired the buttons so that you get $10 if you press both buttons and $0 otherwise. You are certain that the predictor has foreseen your choice.
The payoff matrix for your plans looks as follows, where 'P1N2' means 'press button 1 and not button 2'.
Pred-P1P2 | Pred-P1N2 | Pred-N1P2 | Pred-N1N2 | |
---|---|---|---|---|
P1P2 | $10 | $12 | $10 | $10 |
P1N2 | $0 | $12 | $0 | $0 |
N1P2 | $0 | $12 | $0 | $0 |
N1N2 | $0 | $15 | $0 | $0 |
The only equilibrium in this decision problem is P1P2. After you've pushed the first button (P1) in stage 1, your decision problem in stage 2 is effectively the top left quarter of the matrix. This problem has two equilibria, P1P2 and P1N2. The second is better. Best-equilibrium CDT therefore says that in stage 2, the continuation of the only ex ante choosable plan P1P2 is no longer choosable.
Implementing a plan
I don't understand the common focus on (DC3). The principle compares ex ante attitudes towards a plan with attitudes towards the plan during its hypothetical implementation, even if that implementation is irrational. If it would be irrational to implement a plan, why should we assume that you should have stable attitudes towards the plan during its implementation? To be sure, if we could show (DC1) – if we could show that any choosable plan is rationally implementable – then (DC3) would have some appeal. But then (DC1) is doing the real work.
Newcomb Insurance With A Coin shows that (DC1) is invalid in best-equilibrium CDT. Permissive CDT escapes the counterexample. It allows you to implement the uniquely choosable plan. Is this always true? That is, can we show the following?
(DC4) If a plan P maximises expected utility conditional on P then each of its acts maximises expected utility conditional on P after the earlier acts have been chosen.
(DC4) resembles (DC2), except that we've conditionalised on P. This ensures that we don't consider the relevant plan as a merely counterfactual alternative, which rendered (DC2) untenable.
(DC4) looks plausible to me. Oddly, I can't prove it without some non-trivial assumptions. The following two assumptions do the job.
Future Determinacy. You are never uncertain about what your future self would choose at later points in the sequence under the supposition that you make a certain choice now.
Strong Centring. If you are certain that you will choose A_{1} now and A_{2}…A_{n} afterwards then you are certain that you would choose A_{2}….A_{n} on the supposition that you now choose A_{1}.
Strong Centring is debatable. Future Determinacy is not plausible as a general assumption. But it is often satisfied. If you know that your future self is rational, you can often figure out what they would do if they faced a certain decision situation. The only counterexamples are situations in which you know that your future self would face a choice in which two options are both choosable, and you don't know which of the options they would pick.
Observation 2. (DC4) holds in CDT whenever Future Determinacy and Strong Centring are satisfied.
Proof: Assume A_{1}…A_{n} maximises (causal) expected utility conditional on P. By the Lemma, we know that after A_{1}…A_{i-1} have been implemented, A_{i}…A_{n} still maximises expected utility conditional on P. That is,
for any B_{i}…B_{n}. We need to show that A_{i} alone also maximises expected utility conditional on P. Suppose for reductio that some alternative B_{i} has greater expected utility conditional on P. That is,
By Future Determinacy, there are acts B_{i+1}…B_{n} such that Cr_{i}^{A1…An}(B_{i+1}…B_{n}//B_{i}) = 1. By Similarity, it follows that Cr_{i}^{A1…An}(w_{}//B_{i}) = Cr_{i}^{A1…An}(w//B_{i}B_{i+1}…B_{n}). Also, by Strong Centring and Similarity, Cr^{P}_{i}(w//A_{i})_{} = Cr^{P}_{i}(w//A_{i}…A_{n}). Thus (2) turns into
And this contradicts (1). QED.
Why do we need Future Determinacy? Consider a two-stage dynamic decision problem, and assume the plan A_{1} ∧ A_{2} maximises expected utility conditional on A_{1} ∧ A_{2}. Now suppose Future Determinacy is false: you don't know would you would choose in stage 2 if you chose B_{1} in stage 1. Let's say you're unsure whether you would choose A_{2} or B_{2}, because both would be equally choiceworthy. We know that (conditional on A_{1} ∧ A_{2}) neither B_{1} ∧ A_{2} nor B_{1} ∧ B_{2} has greater expected utility than A_{1} ∧ A_{2}. Oddly, their disjunction – which is equivalent to B_{1} – might still have greater expected utility than A_{1} ∧ A_{2}. And then A_{1} would not maximise expected utility (conditional on A_{1} ∧ A_{2}).
Instead of Future Determinacy, we could also require that (conditional on A_{1}…A_{n}) no disjunction of plan continuations has greater expected utility than each disjunct. Or more specifically: If some plan continuations all have expected utility x then their disjunction does not have expected utility greater than x. Let's call a scenario bizarre if it falsifies both Future Determinacy and this condition. Any counterexample to (DC4) would have to be bizarre (given our other assumptions, like Strong Centring).
Observation 3. (DC1) holds in permissive CDT for any scenario that is not bizarre.
This immediately follows from Observation 2.
Observation 4. Any non-bizarre case in which best-equilibrium CDT violates (DC1) is a case in which simultaneous control over all acts in the relevant sequence is bad news, indicating that the agent has been given worse options.
This is because a better equilibrium in a decision problem is always better because it carries better news about your options. By (DC4), the optimal planning equilibrium is still an equilibrium during implementation. If there is a better equilibrium during implementation this means that the original equilibrium – the planning equilibrium – carries worse news about the options.
It would be good to figure out what happens in "bizarre" cases.
I can't be the first to look into dynamic consistency from a CDT perspective. Any literature suggestions are welcome.
Thanks for these posts. I've found them v useful, but I'm getting myself a bit confused with this one. I've tried to explain my confusion below---any help would be appreciated.
-------
I'm not sure whether you think that, in cases of decision instability, where no pure option is causally ratifiable, any option given positive probability in equilibrium is choosable, or whether you think that in these cases, no option is choosable, or whether you think that the 'mixed' act is an option which is choosable. I'm going to suppose that it's the first one, where any 'live' option in equilibrium is rationally permissible (but maybe that's where I'm going wrong).
If that's right, then I'd have thought that both (DC3) and (DC4) are violated by CDT in the sequential decision I talk about in section 3.3 of this paper (https://philpapers.org/rec/GALETC-3).
Here's the decision: at stage one, you choose between three boxes, call them A, B, and C. At stage two, you're given the option of keeping the box you chose at stage one, or else paying $60 to exchange it for its 'successor' in the sequence (A, B, C). So, if you chose A at stage one, then at stage two you may either keep A for free or pay $60 to exchange A for B. If you chose B at stage one, then at stage two you may either keep B for free or pay $60 to exchange B for C. If you chose C at stage one, then at stage two you may either keep C for free or pay $60 to exchange C for A.
If a reliable predictor predicted that you would end up with box A after stage 2, then they put nothing in A, $100 in B, and a bill for $100 in C. If they predicted that you would end up with box B after stage 2, then they put nothing in B, $100 in C, and a bill for $100 in A. If they predicted that you would end up with box C after stage 2, then they put nothing in C, $100 in A, and a bill for $100 in B. To keep things simple, suppose that, conditional on you ending up with box X, you are 100% sure that you were predicted to end up with box X.
There are six possible plans: you could either choose A and stick with it (AA), choose A and switch to B (AB), choose B and stick with B (BB), choose B and switch to C (BC), choose C and stick with it (CC), or choose C and switch to A (CA).
I'm supposing that it's permissible to choose *some* plan at the start of this sequential choice problem. Whichever plan is permissible to choose, it's definitely not AB, BC, or CA. These plans are causally dominated by BB, CC, and AA, respectively. So at least one of the 'not switching' plans is permissible, or perhaps its permissible to choose each of them with a 1/3rd probability. But only the 'switching' plans are implementable. It doesn't matter which box you choose at stage one. At stage two, if "x" is your credence that you'll stick to your first choice, then the utility of sticking with your first choice will be 100x - 100, and the utility of switching will be 100x - 60. Since the latter is higher, no matter the value of "x", CDT will say that you are required to pay the $60 to switch at stage two.
But then, it looks like CDT violates (DC3), since some 'non-switching' plan (or some mixture thereof?) is permissible at the start of this sequential choice problem, but the continuation of those plans is not permissible at stage two. Moreover, it looks to me like the 'mixed' plan which, at stage one, selects each box with 1/3rd probability and then doesn't switch at stage two maximises expected utility conditional on it being chosen. However, at stage two, not switching doesn't maximise expected utility conditional on any 'not switching' plan being chosen. So it looks to me like this is a decision in which CDT violates (DC4) as well.
I think that the decision also shows that CDT doesn't satisfy the following, even weaker principle:
(DC5) There is some plan such that: 1) it is permissible to choose that plan; and 2) it is permissible to implement that plan, after the earlier acts have been chosen.
But, like I said, I think I'm missing something here.