On Functional Decision Theory

I recently refereed Eliezer Yudkowsky and Nate Soares's "Functional Decision Theory" for a philosophy journal. My recommendation was to accept resubmission with major revisions, but since the article had already undergone a previous round of revisions and still had serious problems, the editors (understandably) decided to reject it. I normally don't publish my referee reports, but this time I'll make an exception because the authors are well-known figures from outside academia, and I want to explain why their account has a hard time gaining traction in academic philosophy. I also want to explain why I think their account is wrong, which is a separate point.

I actually don't know Nate Soares, but Eliezer Yudkowsky is a celebrity in the "rationalist" community. Many of his posts on the Less Wrong blog are gems. I also enjoyed his latest book, Inadequate Equilibria. Yudkowsky seems to be interested in almost everything, but he regards decision theory as his main area of research. I also work in decision theory, but I've always struggled with Yudkowsky's writings on this topic.

Before I explain what I found wrong with the paper, let me review the main idea and motivation behind the theory it defends.

Standard lore in decision theory is that there are situations in which it would be better to be irrational. Three examples.

Blackmail. Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy's gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy's interest to blackmail Donald. If Donald were irrational, he would blow Stormy's gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.

Prisoner's Dilemma with a Twin. Twinky and her clone have been arrested. If they both confess, each gets a 5 years prison sentence. If both remain silent, they can't be convicted and only get a 1 year sentence for obstructing justice. If one confesses and the other remains silent, the one who confesses is set free and the other gets a 10 year sentence. Neither cares about what happens to the other. Here, confessing is the dominant act and the unique Nash equilibrium. So if Twinky and her clone are rational, they'll each spend 5 years in prison. If they were irrational and remained silent, they would get away with 1 year.

Newcomb's Problem with Transparent Boxes. A demon invites people to an experiment. Participants are placed in front of two transparent boxes. The box on the left contains a thousand dollars. The box on the right contains either a million or nothing. The participants can choose between taking both boxes (two-boxing) and taking just the box on the right (one-boxing). If the demon has predicted that a participant one-boxes, she put a million dollars into the box on the right. If she has predicted that a participant two-boxes, she put nothing into the box. The demon is very good at predicting, and the participants know this. Each participant is only interested in getting as much money as possible. Here, the rational choice is to take both boxes, because you are then guaranteed to get $1000 more than if you one-box. But almost all of those who irrationally take just one box end up with a million dollars, while most of those who rationally take both boxes leave with $1000.

The driving intuition behind Yudkowsky and Soares's paper is that decision theorists have been wrong about these (and other) cases: in each case, the supposedly irrational choice is actually rational. Whether a pattern of behaviour is rational, they argue, should be measured by how good it is for the agent. In Newcomb's Problem with Transparent Boxes, one-boxers fare better than two-boxers. So we should regard one-boxing as rational. Similarly for the other examples. Standard decision theories therefore get these cases wrong. We need a new theory.

Functional Decision Theory (FDT) is meant to be that theory. FDT recommends blowing the gaff in Blackmail, remaining silent in Prisoner's Dilemma with a Twin, and one-boxing in Newcomb's Problem with Transparent Boxes.

Here's how FDT works, and how it differs from the most popular form of decision theory, Causal Decision Theory (CDT). Suppose an agent faces a choice between two options A and B. According CDT, the agent should evaluate these options in terms of their possible consequences (broadly understood). That is, the agent should consider what might happen if she were to choose A or B, and weigh the possible outcomes by their probability. In FDT, the agent should not consider what would happen if she were to choose A or B. Instead, she ought to consider what would happen if the right choice according to FDT were A or B.

Take Newcomb's Problem with Transparent Boxes. Without loss of generality, suppose you see $1000 in the left box and a million in the right box. If you were to take both boxes, you would get a million and a thousand. If you were to take just the right box, you would get a million. So Causal Decision Theory says you should take box boxes. But let's suppose you follow FDT, and you are certain that you do. You should then consider what would be the case if FDT recommended one-boxing or two-boxing. These hypotheses are not hypotheses just about your present choice. If FDT recommended two-boxing, then any FDT agent throughout history would two-box. And, crucially, the demon would (probably) have foreseen that you would two-box, so she would have put nothing into the box on the right. As a result, if FDT recommended two-boxing, you would probably end up with $1000. To be sure, you know that there's a million in the box on the right. You can see it. But according to FDT, this is irrelevant. What matters is what would be in the box relative to different assumptions about what FDT recommends.

To spell out the details, one would now need to specify how to compute the probability of various outcomes under the subjunctive supposition that FDT recommended a certain action. Yudkowsky and Soares are explicit that the supposition is to be understood as counterpossible: we need to suppose that a certain mathematical function, which in fact outputs A for input X, instead were to output B. They do not explain how to compute the probability of outcomes under such a counterpossible supposition. So we don't get any details spelled out. This is flagged as the main open question for FDT.

It is not obvious to me why Yudkowsky and Soares choose to model the relevant supposition as a mathematical falsehood. For example, why not let the supposition be: I am the kind of agent who chooses A in the present decision problem? That is an ordinary contingent (centred) propositions, since there are possible agents who do choose option A in the relevant problem. These agents may not follow FDT, but I don't see why that would matter. For some reason, Yudkowsky and Soares assume that an FDT agent is certain that she follows FDT, and this knowledge is held fixed under all counterfactual suppositions. I guess there is a reason for this assumption, but they don't tell us.

Anyway. That's the theory. What's not to like about it?

For a start, I'd say the theory gives insane recommendations in cases like Blackmail, Prisoner's Dilemma with a Twin, and Newcomb's Problem with Transparent Boxes. Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay! FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn't on the table. It's not something you can choose.

[Clarifications: (a) I assume that the blackmailer is not infallible, but extremely reliable, at predicting how you will react. Even if you follow FDT you might therefore find yourself in this situation. (b) You are rationally certain that you will never find yourself again in a similar situation, so that your act is useless as a signal to potential future blackmailers.]

Admittedly, that's not much of an objection. I say you'd be insane not to pay the $1, Yudkowsky and Soares say you'd be irrational to pay. Neither of us can prove that their judgement is right from neutral premises.

What about the fact that FDT agents do better than (say) CDT agents? I admit that if this were a fact, it would be somewhat interesting. But it's not clear if it is true.

First, it depends on how success is measured. If you face the choice between submitting to blackmail and refusing to submit (in the kind of case we've discussed), you fare dramatically better if you follow CDT than if you follow FDT. If you are in Newcomb's Problem with Transparent Boxes and see a million in the right-hand box, you again fare better if you follow CDT. Likewise if you see nothing in the right-hand box.

So there's an obvious sense in which CDT agents fare better than FDT agents in the cases we've considered. But there's also a sense in which FDT agents fare better. Here we don't just compare the utilities scored in particular decision problems, but also the fact that FDT agents might face other kinds of decision problems than CDT agents. For example, FDT agents who are known as FDT agents have a lower chance of getting blackmailed and thus of facing a choice between submitting and not submitting. I agree that it makes sense to take these effects into account, at least as long as they are consequences of the agent's own decision-making dispositions. In effect, we would then ask what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime. Even then, however, there is no guarantee that FDT would come out better. What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.

Moreover, FDT does not in fact consider only consequences of the agent's own dispositions. The supposition that is used to evaluate acts is that FDT in general recommends that act, not just that the agent herself is disposed to choose the act. This leads to even stranger results.

Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there's a significant probability that I wouldn't exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

In Procreation, FDT agents have a much worse life than CDT agents.

[Edit: Another potential case in which its hard to see any sense in which FDT agents do better than CDT agents is suggested in this comment below.]

All that said, I agree that there's an apparent advantage of the "irrational" choice in cases like Blackmail or Prisoner's Dilemma with a Twin, and that this raises an important issue. The examples are artificial, but structurally similar cases arguably come up a lot, and they have come up a lot in our evolutionary history. Shouldn't evolution have favoured the "irrational" choices?

Not necessarily. There is another way to design agents who refuse to submit to blackmail and who cooperate in Prisoner Dilemmas. The trick is to tweak the agents' utility function. If Twinky cares about her clone's prison sentence as much as about her own, remaining silent becomes the dominant option in Prisoner's Dilemma with a Twin. If Donald develops a strong sense of pride and would rather take Stormy down with him than submit to her blackmail, refusing to pay becomes the rational choice in Blackmail.

FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others. My CDT agent would still two-box in Newcomb's Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.

The story I'm hinting at has been well told by others. I'd especially recommend Brian Skyrms's Evolution of the Social Contract and chapter 6 of Simon Blackburn's Ruling Passions.

So here's the upshot. Whether FDT agents fare better than CDT agents depends on the environment, on how "faring better" is measured, and on what the agents care about. Across their lifetime, purely selfish agents might do better, in a world like ours, if they followed FDT. But that doesn't persuade me that the insane recommendations FDT are correct.

So far, I have explained why I'm not convinced by the case for FDT. I haven't explained why I didn't recommend the paper for publication. That I'm not convinced is not a reason. I'm rarely convinced by arguments I read in published papers.

The standards for deserving publication in academic philosophy are relatively simple and self-explanatory. A paper should make a significant point, it should be clearly written, it should correctly position itself in the existing literature, and it should support its main claims by coherent arguments. The paper I read sadly fell short on all these points, except the first. (It does make a significant point.)

Here, then, are some of the complaints from my referee report, lightly edited for ease of exposition. I've omitted several other complaints concerning more specific passages or notation from the paper.

  1. A popular formulation of CDT assumes that to evaluate an option A we should consider the probability of various outcomes on the subjunctive supposition that A were chosen. That is, we should ask how probable such-and-such an outcome would be if option A were chosen. The expected utility of the option is then defined as the probability-weighted average of the utility of these outcomes. In much of their paper, Yudkowsky and Soares appear to suggest that this is exactly how expected utility is defined in FDT. The disagreement between CDT and FDT would then boil down to a disagreement about what is likely to be the case under the subjunctive supposition that an option is chosen.

    For example, consider Newcomb's Problem with Transparent Boxes. Suppose (without loss of generality) that the right-hand box is empty. CDT says you should take both boxes because if you were to take only the right-hand box you would get nothing whereas if you were to take both boxes, you would get $1000. According to FDT (as I presented it above, and as it is presented in parts of the paper), we should ask a different question. We should ask would be the case if FDT recommended one-boxing, and what would be the case if FDT recommended two-boxing. For much of the paper, however, Yudkowsky and Soares seem to assume that these questions coincide. That is, they suggest that you should one-box because if you were to one-box, you would get a million. The claim that you would get nothing if you were to one-box is said to be a reflection of CDT.

    If that's really what Yudkowsky and Soares want to say, they should, first, clarify that FDT is a special case of CDT as conceived for example by Stalnaker, Gibbard & Harper, Sobel, and Joyce, rather than an alternative. All these parties would agree that the expected utility of an act is a matter of what would be the case if the act were chosen. (Yudkowsky and Soares might then also point out that "Causal Decision Theory" is not a good label, given that they don't think the relevant conditionals track causal dependence. John Collins has made essentially the same point.)

    Second, and more importantly, I would like to see some arguments for the crucial claim about subjunctive conditionals. Return once more to the Newcomb case. Here's the right-hand box. It's empty. It's a normal box. Nothing you can do has any effect on what's in the box. The demon has tried to predict what you will do, but she could be wrong. (She has been wrong before.) Now, what would happen if you were to take that box, without taking the other one? The natural answer, by the normal rules of English, is: you would get an empty box. Yudkowsky and Soares instead maintain that the correct answer is: you would find a million in the box. Note that this is a claim about the truth-conditions of a certain sentence in English, so facts about the long-run performance of agents in decision problems don't seem relevant. (If the predictor is highly reliable, I think a "backtracking" reading can become available on which it's true that you would get a million, as Terry Horgan has pointed out. But there's still the other reading, and it's much more salient if the predictor is less reliable.)

    Third, later in the paper it transpires that FDT can't possibly be understood as a special case of CDT along the lines just suggested because in some cases FDT requires assessing the expected utility of an act by looking exclusively at scenarios in which that act is not performed. For example, in Blackmail, not succumbing is supposed to be better because it decreases the chance of being blackmailed. But any conditional of the form if the agent were to do A, then the agent would do A is trivially true in English.

    Fourth, in other parts of the paper it is made clear that FDT does not instruct agents to suppose that a certain act were performed, but rather to suppose that FDT always were to give a certain output for a certain input.

    I would recommend dropping all claims about subjunctive conditionals involving the relevant acts. The proposal should be that the expected utility of act A in decision problem P is to be evaluated by subjunctively supposing not A, but the proposition that FDT outputs A in problem P. (That's how I presented the theory above.) The proposal then wouldn't rely on implausible and unsubstantiated claims about English conditionals.

    [I then listed several passages that would need to be changed if the suggestion is adopted.]

  2. I'm worried that so little is said about how subjunctive probabilities are supposed to be revised when supposing that FDT gives a certain output for a certain decision problem. Yudkowsky and Soares insist that this is a matter of subjunctively supposing a proposition that's mathematically impossible. But as far as I know, we have no good models for supposing impossible propositions.

    Here are three more specific worries.

    First, mathematicians are familiar with reductio arguments, which appear to involve impossible suppositions. "Suppose there were a largest prime. Then there would be a product x of all these primes. And then x+1 would be prime. And so there would be a prime greater than all primes." What's noteworthy about these arguments is that whenever B is mathematically derivable from A, then mathematicians are prepared to accept 'if A were the case then B would be the case', even if B is an explicit contradiction. (In fact, that's where the proof usually ends: "If A were the case then a contradiction would be the case; so A is not the case.")

    If that is how subjunctive supposition works, FDT is doomed. For if A is a mathematically false proposition, then anything whatsoever mathematically follows from A. (I'm ignoring the subtle difference between mathematical truth and provability, which won't help.) So then anything whatsoever would be the case on a counterpossible supposition that FDT produces a certain output for a certain decision problem. We would get: If FDT recommended two-boxing in Newcomb's Problem, then the second box would be empty, but also /If FDT recommended two-boxing in Newcomb's Problem, then the second box would contain a million/, and If FDT recommended two-boxing in Newcomb's Problem, the second box would contain a round square.

    A second worry. Is a probability function revised by a counterpossible supposition, as employed by FDT, still a probability function? Arguably not. For presumably the revised function is still certain of elementary mathematical facts such as the Peano axioms. (If, when evaluating a relevant scenario, the agent is no longer sure whether 0=1, all bets are off.) But some such elementary facts will logically entail the negation of the supposed hypothesis. So in the revised probability function, probability 1 is not preserved under logical entailment; and then the revised function is no longer a classical probability function. (This matters, for example, because Yudkowsky and Soares claim that the representation theorem from Joyce's Foundations of Causal Decision Theory can be adapted to FDT, but Joyce's theorem assumes that the supposition preserves probabilistic coherence.)

    Another worry. Subjunctive supposition is relatively well-understood for propositions about specific events at specific times. But the hypothesis that FDT yields a certain output for a certain input is explicitly not spatially and temporally limited in this way. We have no good models for how supposing such general propositions works, even for possible propositions.

    The details matter. For example, assume FDT actually outputs B for problem P, and B' for a different problem P'. Under the counterpossible supposition that FDT outputs A for P, can we hold fixed that it outputs B' for P'? If not, FDT will sometimes recommend choosing a particular act because of the advantages of choosing a different act in a different kind of decision problem.

  3. Standard decision theories are not just based on brute intuitions about particular cases, as Yudkowsky and Soares would have us believe, but also on general arguments. The most famous of these are so-called representation theorems which show that the norm of maximising expected utility can be derived from more basic constraints on rational preference (possibly together with basic constraints on rational belief). It would be nice to see which of the preference norms of CDT Yudkowsky and Soares reject. It would also be nice if they could offer a representation theorem for FDT. All that is optional and wouldn't matter too much, in my view, except that Yudkowsky and Soares claim (as I mentioned above) that the representation theorem in Joyce's Foundations of Causal Decision Theory can be adapted straightforwardly to FDT. But I doubt that it can. The claim seems to rest on the idea that FDT can be formalised just like CDT, assuming that subjunctively supposing A is equivalent to supposing that FDT recommends A. But as I've argued above, the latter supposition arguably makes an agent's subjective probability function incoherent. More obviously, in cases like Blackmail, A is plausibly false on the supposition that FDT recommends A. These two aspects already contradict the very first two points in the statement of Joyce's representation theorem, on p.229 of The Foundations of Causal Decision Theory, under 7.1.a.

  4. Yudkowsky and Soares constantly talk about how FDT "outperforms" CDT, how FDT agents "achieve more utility", how they "win", etc. As we saw above, it is not at all obvious that this is true. It depends, in part, on how performance is measured. At one place, Yudkowsky and Soares are more specific. Here they say that "in all dilemmas where the agent's beliefs are accurate [??] and the outcome depends only on the agent's actual and counterfactual behavior in the dilemma at hand – reasonable constraints on what we should consider "fair" dilemmas – FDT performs at least as well as CDT and EDT (and often better)". OK. But how we should we understand "depends on … the dilemma at hand"? First, are we talking about subjunctive or evidential dependence? If we're talking about evidential dependence, EDT will often outperform FDT. And EDTers will say that's the right standard. CDTers will agree with FDTers that subjunctive dependence is relevant, but they'll insist that the standard Newcomb Problem isn't "fair" because here the outcome (of both one-boxing and two-boxing) depends not only on the agent's behavior in the present dilemma, but also on what's in the opaque box, which is entirely outside her control. Similarly for all the other cases where FDT supposedly outperforms CDT. Now, I can vaguely see a reading of "depends on … the dilemma at hand" on which FDT agents really do achieve higher long-run utility than CDT/EDT agents in many "fair" problems (although not in all). But this is a very special and peculiar reading, tailored to FDT. We don't have any independent, non-question-begging criterion by which FDT always "outperforms" EDT and CDT across "fair" decision problems.

  5. FDT closely resembles Justin Fisher's "Disposition-Based Decision Theory" and the proposal in David Gauthier's Morals by Agreement, both of which are motivated by cases like Blackmail and Prisoner's Dilemma with a Twin. Neither is mentioned. It would be good to explain how FDT relates to these earlier proposals.

  6. The paper goes to great lengths criticising the rivals CDT and EDT. The apparent aim is to establish that both CDT and EDT sometimes make recommendations that are clearly wrong. Unfortunately, these criticisms are largely unoriginal, superficial, or mistaken.

    For example, Yudkowsky and Soares fault EDT for giving the wrong verdicts in simple medical Newcomb problems. But defenders of EDT such as Arif Ahmed and Huw Price have convincingly argued that the relevant decision problems would have to be highly unusual. Similarly, Yudkowsky and Soares cite a number of well-known cases in which CDT supposedly gives the wrong verdict, such as Arif's Dicing with Death. But again, most CDTers would not agree that CDT gets these cases wrong. (See this blog post for my response to Dicing with Death.) In general, I am not aware of any case in which I'd agree that CDT – properly spelled out – gives a problematic verdict. Likewise, I suspect Arif does not think there are any cases in which EDT goes wrong. It just isn't true that both CDT and EDT are commonly agreed to be faulty. If Yudkowsky and Soares want to argue that they are, they need to do more than revisit well-known scenarios and make bold assertions about what CDT and EDT say about them.

    The criticism of CDT and EDT also contains several mistakes. For example, Yudkowsky and Soares repeatedly claim that if an EDT agent is certain that she will perform an act A, then EDT says she must perform A. I don't understand why. I guess the idea is that (1) if P(B)=0, then the evidential expected utility of B is undefined, and (2) any number is greater than undefined. But lots of people, from Kolmogoroff to Hajek, have argued against (1), and I don't know why anyone would find (2) plausible.

    For another example, Yudkowsky and Soares claim that CDT (like FDT) involves evaluating logically impossible scenarios. For example, "[CDTers] are asking us to imagine the agent's physical action changing while holding fixed the behavior of the agent's decision function". Who says that? I would have thought that when we consider what would happen if you took one box in Newcomb's Problem, the scenario we're considering is one in which your decision function outputs one-boxing. We're not considering an impossible scenario in which your decision function outputs two-boxing, you have complete control over your behaviour, and yet you choose to one-box. There are many detailed formulations of CDT. Yudkowsky and Soares ignore almost all of them and only mention the comparatively sketchy theory of Pearl. But even Pearl's theory plausibly doesn't appeal to impossible propositions to evaluate ordinary options. Lewis's or Joyce's or Skyrms's certainly doesn't.

I still think the paper could probably have been published after a few rounds of major revisions. But I also understand that the editors decided to reject it. Highly ranked philosophy journals have acceptance rates of under 5%. So almost everything gets rejected. This one got rejected not because Yudkowsky and Soares are outsiders or because the paper fails to conform to obscure standards of academic philosophy, but mainly because the presentation is not nearly as clear and accurate as it could be.


# on 19 January 2019, 03:36

Notice that all the situations (one excepted) described in this post where FDT does worse than CDT are logically impossible, while the situations where FDT does better are not.

Or maybe we’re using the versions of the problems where the blackmailer is not entirely predictable and might still blackmail the functional decision theorist (but be more likely to blackmail the causal decision theorist), or where the Newcomb predictor is not a perfect predictor but only very likely to predict correctly, or where the other prisoner twin might be hit by a cosmic ray with low probability and not make the same decision as you. If so, situations where CDT does better than FDT are less likely than situations where FDT does better, so FDT still comes out ahead.

Let’s assume that we’re using the deterministic version of each of these problems, rather than the probabilistic version: the blackmailer is guaranteed to know what decision theory you use and to act accordingly, the Newcomb predictor is guaranteed to predict correctly, your twin is guaranteed to make the same prediction as you, your father is guaranteed to procreate if and only if you do.

Now let’s consider the blackmail problem. The post says, “If you face the choice between submitting to blackmail and refusing to submit (in the kind of case we’ve discussed), you fare dramatically better if you follow CDT than if you follow FDT.” This is true. The problem is that, if you are being blackmailed, this means that you are not going to follow FDT. If you were going to follow FDT, the blackmailer would not have blackmailed you. The fact that you have been blackmailed means you can be 100% certain that you will not follow FDT. In itself, being 100% certain that you will not follow FDT does not prevent you from following FDT. But it does make the situation where you follow FDT and come worse off impossible, which is relevant to our determination of which decision theory is better.

Let’s consider the Newcomb problem. If the Newcomb predictor is guaranteed to predict your choice correctly, it is impossible for an agent using CDT to see a million in the right-hand box.

It never does any good to dismiss a logical inconsistency and to consider what happens anyway.

What happens if we ignore this and suppose that the CDT agent does see a thousand in the left-hand box and a million in the right-hand box? Then using this supposition we can prove that they will get both amounts if they two-box. But since they are a CDT agent, we know that they will two-box, therefore there is nothing in the right-hand box, so we can prove that they will only get a thousand if they two-box. But suppose that they one-box instead. Since they are a CDT agent, we know that they will two-box, so we know that there is nothing in the right-hand box, so we can prove that if they one-box they will get nothing. However, we know that they see a million in the right-hand box, so we can prove that if they one-box, they will get a million. So we can prove that they should one-box, and we can prove that they should two-box. At this point we can conclude that a million and nothing are the same thing, and that a thousand is equal to a million plus a thousand. Avec des si, on mettrait Paris en bouteille.

The procreation example is harder to prove inconsistent because it relies on infinite regress.

Here’s a first way to resolve it. Should I procreate? If I do, my life will be miserable. But my father followed the same decision theory I do, so if I choose not to procreate, that means my father will have chosen not to procreate. So I will not exist. So I can prove that, if I end up choosing not to procreate, that means I do not exist. However, I do exist. That’s a contradiction. I guess that means I will not choose not to procreate. Knowing that I will not make that choice does not in itself prevent me from making the choice though. Should I choose not to procreate anyway? Well, I can prove that if I do not procreate, then I will not exist, and that if I do, then my life will be miserable. A miserable life is better than not existing, so I should procreate. However, I know that I exist, and that is the consequent of the implication “if I do not procreate, then I [will] exist”, so the implication is true, whereas if I choose to procreate I still exist but my life is miserable. A miserable life is worse than a non-miserable life, so I should not procreate. Oops, I can prove that I should procreate and that I should not procreate? That’s a contradiction, and this one doesn’t rely on the supposition that I made any particular choice. The world I am living in must be inconsistent.

We can also solve it by directly addressing the infinite regress.

Should I procreate? If I do, my life will be miserable. But my father followed the same thought process I did, would have made the same decisions, so if I choose not to procreate, that means my father will have chosen not to procreate. Then I would not exist, and a miserable life is better than not existing, so I should procreate.

Why did my father procreate, though, if that made his life miserable?

Oh, right. My grandfather followed the same thought process that my father did, so if he chose not to procreate, that means his father would have chosen not to procreate, and so he would not exist either. Since he too considered a miserable life better than not existing, he chose to procreate.

Why did my grandfather procreate, though, if that made his life miserable? What about my great-grandfather? What about—

The recursive buck stops *here*.

My {The Recursive Buck Stops Here}-great-. . .-great-grandfather did not choose to procreate because that would have made his life miserable. Therefore I do not exist. That’s a contradiction. The assumption that each generation of ancestry uses FDT and only exists if the previous chose to procreate is inconsistent with the assumption that any of them exist. No FDT agent can ever face this problem, and no designer can ever have to pick a decision theory for an agent that could have to face this problem. And if we only assume that it is unlikely that the father made a different decision from you, and not that it is certain that he did not, then FDT makes it less likely that you will not exist, and so it again comes out ahead of CDT.

There is one category of situations (the one exception I mentioned) where FDT can leave you worse off than CDT, and that is what happens when “someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options”. FDT can change your decisions to make them optimal, but it can’t change the initial decision theory you used to make the decisions. It can only pick decisions identical to those of another decision theory. That doesn’t prevent an environment from knowing what your initial decision theory was and punishing you on that basis. This is unsolvable by any decision theory. Therefore it can hardly be taken as a point against FDT.

I said that it never does any good to dismiss a logical inconsistency. I want to clarify that this is not the same as saying that we should dismiss thought experiments because their premises are unlikely. “Extremism In Thought Experiment Is No Vice”. Appealing to our intuitions about extreme cases is informative. But logical impossibility is informative too, and is what we care about when comparing decision theories. Nate Soares has claimed “that *all* decision-making power comes from the ability to induce contradictions: the whole reason to write an algorithm that loops over actions, constructs models of outcomes that would follow from those actions, and outputs the action corresponding to the highest-ranked outcome is so that it is contradictory for the algorithm to output a suboptimal action.”

# on 21 January 2019, 12:52

@artifax: all the situations described in the post were meant to be "non-deterministic" in your sense, so there's nothing impossible about CDT agents outperforming FDT agents in these cases.

I see that on a population-level statistical average, purely selfish FDT agents often do better than purely selfish CDT agents. I said as much in the post, so I don't think we disagree here. Except that I don't think average population-level success among selfish agents is an adequate test for the right decision theory. A somewhat more adequate test, I think, is to look at which theory gives better results across a wide range of decision problems, no matter how these problems came about. On that measure, selfish CDT agents generally do better than selfish FDT agents. But of course I can't prove to you that my test is more adequate.

# on 21 January 2019, 17:25

PS: In situations like Procreation (the "non-deterministic" case I described), I don't even see a meaningful statistical sense in which FDT agents do better than CDT agents.

# on 07 March 2019, 18:34

Thanks for sharing this. I've always had a lot of admiration for the LW community. I think your complaints are good ones, though.

I wanted to call attention to a relevant and underappreciated paper by John Leslie: "Ensuring Two Bird Deaths with One Throw" (Mind, 1991) <jstor.org/stable/2254984>. If you have a perfect clone... then by killing a bird with a stone, you ensure that your clone does likewise.

Leslie calls this phenomenon "quasi-causation" and applies it to Newcomb's Problem, among other issues.

# on 13 March 2020, 17:07

Good job, Wolfgang.
I agreed with almost everything you wrote.
But I have a different interpretation on the 3 paradoxes.
"Standard lore in decision theory is that there are situations in which it would be better to be irrational." -- I think should be changed to "...to be considered (by other agents) to be irrational".
That's an important difference, for the agent can be rational, but still be considered/believed by other agent to be irrational.
Clearly applies to the case of blackmail paradox, but also to the Newcomb's problem.
In the latter, it would be good (for the agent faced with the choice) if the predictor/demon considers the agent to be irrational, and thus predict that it will choose 1 box. But after boxes are already set up, the agent, rationally, should choose both boxes, as his decision cannot influence the past decision of the demon what to put inside boxes.
This is similar to how in real life, it's sometimes convenient for a person to be considered fool by others (and even pretend to be fool), while in reality be much smarter.

But for Prisoner's Dilemma with a Twin:
First of all, should not the relevant question/conclusion be what ONE particular agent should decide, as opposed to 2 of them? I suggest the analysis should compare between Twinky being rational being irrational (and then split each situation based on how his tween is).
Second, why do you think that being irrational means making a particular choice (here, to remain silent? I'd say, being rational means analyzing and taking a decision; being irrational, means not being capable of that, which means taking a random decision (which can even happen to be the same as what a rational agent takes).
With this modification, my analysis gives, for Blackmail's problem, that being rational is better here.

# on 13 March 2020, 17:37

Thanks Vic.

You're certainly right that in the examples it is better to be considered irrational. But I think there's also a sense in which it would be better for the relevant agents if they actually were (and had always been) irrational. For example, the predictor would then very likely have put a million into the first box. I agree, of course, that the agent should choose both boxes. And yes, in the Twin Dilemma, the right question to ask is what Twinky should do -- and the answer is that she should confess. Still, when we ask how different kinds of people fare in this scenario, those who remain silent fare better, and by hypothesis they make the same choice as their twin.

I did assume being rational in the examples I gave involves making a particular choice (but not remaining silent! The rational choice is to confess). I agree rationality involves more. But I don't agree the alternative to rationality is to act randomly. For example, an agent who always minimizes expected utility would act irrationally, but not randomly.

# on 14 March 2020, 05:45

Thanks for replying.
By the way, I should have mentioned that I am not here to defend functional decision theory: I haven't even read that; nor have I ever communicated with it's authors.

I see what you mean with rational agents fairing worse than irrational in the Newcomb's problem. For now, I tend to agree.
In twin dilemma: from this sentence in the article: "If one confesses and the other remains silent[...]" , I thought you meant that they can choose differently; but now I see that you mean they actually cannot.
And I take back my idea that being irrational is to act randomly.

I'll get back to these examples later; but for now, could we clarify/ agree on some definitions?

rational choice/decision = choice taken by agent that results in BEST probable outcome for this agent; let's say, maximizes expected utility

irrational choice = choice taken by agent that results in WORST probable outcome for this agent; let's say, minimizes expected utility

rational agent = agent that in every situation makes the rational choice

irrational agent = agent that in every situation makes the irrational choice

partially (ir)rational agent = agent that in a certain fraction of situations ( >0 and <1 ) makes the (ir)rational choice

If so, then an agent either is rational or is not rational (latter meaning irrational or partially rational): it cannot CHOOSE to be rational (instead, can only choose the kind of choice to make).

Agent A may or may not know the rationality type of agent B; in general, may know it only with a certain degree of accuracy (or better said, may believe that, and the belief has a certain degree of accuracy) , say measured using probability.

Similarly, in a given situation, agent A may or may not know, or believe with some accuracy, what choice agent B can take.

What agent A believes about agent B's rationality, or type of choice it will take in a given situation, is independent proposition from what the rationality of B actually is, or , respectively, choice B will take.

# on 14 March 2020, 06:06

It seems to me that, using clear definitions and systematic methods, we could shine good light on these and other paradoxes, and perhaps even remove the "paradox" or "unsolved" status of many of them.

# on 14 March 2020, 11:46

Hi Vic, I agree that clear definitions are useful here, and can help to make the apparent paradoxes less puzzling.

# on 14 March 2020, 18:39

After a little thought, a generalization for the:

partially (ir)rational agent = agent that is neither rational or irrational.

Thus, in some decision problems, or in some instances of some decision problems, it makes the rational choice, and in others- the irrational.

All these definitions/ assumptions - do you find them wrong/correct? or do you use different operational definitions?..
I'm sure there are whole theories of rationality and decision out there that you know and I don't... I just want to keep it simple

# on 15 March 2020, 13:40

@Vic: I generally assume that rationality involves more than just maximizing expected utility; for example, it also requires adequately responding to evidence. And I would call any agent who isn't rational 'irrational'. But your definitions are also fine with me. It all depends on what we go on to do with them.

# on 12 May 2020, 10:20

Some people from the LW community have tried to respond to your objections to the theory. I am tempted to post one of the criticisms here, but I am not allowed to post HTML links, and it is quite long, so I will just ask you to go to the "Open Thread January 2019" thread on Lesswrong and look at user dxu's post, which is quite famous there. It is quite confusing, but the people there seem to hold it in high regard. I will post a few words from it here, but this is hardly a summary.

He/she says that your statement

"FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed."

is wrong, and based on a 'naive intuition'. He/she continues by saying "it's not immediately obvious what's wrong with this assumption" and tries to explain what he means. Quoting:

"In certain decision problems, your counterfactual behavior matters as much--if not more--than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens."

"Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself.

When "you're" being blackmailed, Schwarz makes the extremely natural assumption that "you" are you. But there's no reason to suppose this is the case. The scenario never stipulates why you're being blackmailed, only that you're being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it's the real you or a mere hypothetical is...

...your decision whether or not to pay up, of course."

# on 12 May 2020, 15:57

Thanks Abir. Here's a link to the comment.

This dxu person seems to have misunderstood my post. All I claimed in the post is that (1) FDT is counter-intuitive, (2) I find the arguments in favour of FDT unpersuasive, and (3) the theory was formally underdeveloped and badly presented in the paper I read.

Yes, I do think that the circumstances that led to a situation are generally irrelevant to which choice is right. I understand that FDT disagrees, but as dxu admits the assumption is intuitively plausible and I could not find any convincing argument against it either in the paper I read or in this comment.

dxu says: "the thing that determines whether it's the real you or a mere hypothetical is your decision whether or not to pay up, of course. If you cave into the blackmail and pay up, then you're almost certainly the real deal. On the other hand, if you refuse to give in, it's very likely that you're simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world." I don't think this is a clear way of putting the point. After all, I am sure that I am the real version of myself, and I am sure that there is nothing I can do that would make me merely hypothetical. But I I see what is probably meant. I just don't agree that it is relevant to rational choice. And I don't see any /argument/ in the comment why it should be.

# on 15 May 2020, 06:29

Regarding transparent box "Newcomb's paradox" and "blackmail", it may be useful to put a different context around Newcomb's paradox...

Fred was asked by Alice to walk her dog for $20, walked her dog, and got paid $20.

One year later, Alice tells Fred that she wants him to give her back $9. She can't take $9 from Fred, or do any other harm to Fred. Fred doesn't want to give $9 back. But Alice convinces Fred that the following happened:

Alice read the FDT paper, the fact that FDT one boxes in the transparent Newcomb's Paradox made Alice happy and less irate and she didn't stiff Fred out of half his payment right then (which is what she generally does).

Now what does Fred do if he runs FDT?

Note that the above is Newcomb's Paradox with $11 in the first box and $9 in the second box, but without the greed for $1000 000 or moralization about being punished for excessive greed.

Presumably the answer should stay the same, if there is indeed an answer at all. Because $11 is still bigger than $9, and it isn't relevant that Fred walked the dog and has money in his wallet already.

An iterated variation is possible, where after being paid $20 Fred decided that Alice was trustworthy and continued to keep walking her dog for $20 a time during that year.

I think FDT is just too ill defined and vague, because they never specify how evaluation of "what if this function returns A" is supposed to be combined with facts about the real world. Should Fred hallucinate $11 in his pocket, instead of $20, when he's evaluating the possibility of not giving Alice $9?

# on 15 May 2020, 09:26

@D: I agree with these worries. But I don't quite follow the example. How is Fred's choice analogous to the choice in Newcomb's Problem (with transparent boxes)?

# on 15 May 2020, 11:54

Maybe I see the idea.

On Monday Alice gives Fred $20. On Friday she demands back $9. She convinces Fred of the following. When she gave him the $20 on Monday, she had just read the FDT paper, and considered what FDT would recommend Fred to do on Friday if she gave him $20 on Monday and then asked him to pay back $9 on Friday. She figured out that FDT says that Fred should return the $9. This amused her so much that she was happy to give him the $20; if she had figured out that FDT recommends not returning the $9, she would only have given Fred $10. Now, on Friday, Fred wonders whether to return the $9. If FDT says yes, it is likely that he got $20 from Alice on Monday (in the strange sense of the conditional relevant to FDT). If FDT says no, it is likely that he got $10. So according to FDT, it's better to return the $9. (Which is what Alice anticipated, and what amused her so much.)

Nice. Is that roughly what you had in mind?

# on 15 May 2020, 13:20

Yes, that works. Although note that for FDT I don't think it matters for what reason Alice's actions were influenced by the FDT paper. E.g. Alice needs not have a pre existing plan to take back $9. I'm not assuming that Alice knows Fred follows FDT.

With Newcomb's paradox, I feel that the issue is it is specifically worded to align all the irrelevancies to make one boxing sound right. It is a large gift, and if you commit a sin of being too greedy, you don't get the large gift. You probably learnt numerous books in your childhood telling you relevant stories.

It is a set up for a logical fallacy.

Hence I'm changing it to $11 and $9 , and making the money already belong to Fred by a contract, and placing it in Fred's wallet. The dog walking bit is completely irrelevant to the logical problem but it is relevant to the fallacy.

I guess it could be argued that Alice isn't fair, but what she does isn't dependent on what decision theory Fred employs, so by any normal definition it is fair. She likes to talk people into giving back the money, and of course that is not good to the people who can be talked into giving back the money.

# on 16 May 2020, 02:34

I also wonder if FDT can combine with inductive inference, given that they claim alternatives override actual physical sensory input in transparent Newcomb's paradox.

The agent has been looking at a raw feed from a high res camera for maybe an hour, until its input contains 1 gigabyte of fundamentally incompressible photon shot noise (overlaid over what it actually saw). Every hypothesis the agent has left, had a prior less than 2^-8E9 (because every hypothesis which didn't predict exactly the right noise got discarded). edit: to be clear, I'm assuming Solomonoff induction and variants, where all remaining hypotheses after receiving sensory input, print a string beginning with that sensory input.

But suppose that there is a hypothesis which describes each bit of sensory input as a result of an FDT calculation. Like the transparent box Newcomb's paradox describes what is seen in the boxes, or the contents of Fred's wallet.

How is that hypothesis is to be penalized as new bits come in? Instead of having to actually "store" new bits (in an increasingly complex hypothesis), it just "decides" what those bits should be, and in the alternative that it decides something that doesn't match sensory input, it's acting as if that was the sensory input.

edit: to be clear the problem is that this hypothesis will eventually become dominant, no matter how low of a prior it may have had.

# on 18 May 2020, 10:34

To be honest, I am still confused about the Transparent Newcomb's Paradox. If the predictor is not completely accurate, but just a 'good predictor', then how can we say whether CDT or FDT agents perform better? Can someone briefly explain me why does Dr. Schwartz say this?

Is one-boxing better when the predictor is just good at predicting, because it means you have a good chance of getting a million, or maybe two-boxing is better because it is better to get either a million and a thousand or just a thousand?

# on 18 May 2020, 17:27

With "FDT" it depends on how it thinks the predictor is making the prediction. It only supposed to diverge from CDT if it believes predictor actually evaluates FDT.

So for FDT, you can't actually specify the predictor as a black box, you have to specify why and how it is being not entirely accurate (while for CDT the black box works).

For what its worth, I personally think if you pop the hood off yourself, and pop the hood off another entity, and examine internals one by one, and find them to coincide such that you can very confidently conclude that the difference between the two is very reliably 0, you should be able to use that fact in your decision making.

Ditto for e.g. popping the hood off a copying machine and determining that it has made an exact replica of you (which you end up playing Prisoner's Dilemma with).

That all should be a part of the hypothesis space though. After all this chain of causal events in your future light cone is an empirical fact you learned. Even if we are born with it, it has evolved and was "learned" by evolution. None of that belongs in a decision theory, even if its tempting to put it there so that the decision theory can better handle some under specified hypothetical.

# on 13 January 2022, 13:24

Your Procreation example, though interesting, isn't very suitable for comparing FDT with CDT, because the payoff structure of this 'problem' changes with the decision theory of the agent playing the problem. A CDT agent playing faces two choices: to procreate or not to procreate. Not procreating leads to a good life, procreating leads to a bad one. An FDT agent has the same choices, but now not procreating leads to no life at all and procreating leads to a bad one. Therefore, to compare CDT with FDT on Procreation is to give CDT a problem with relatively good possible payoffs and FDT one with relatively bad possible payoffs. Claiming CDT does better on Procreation, then, is false.

# on 13 October 2022, 01:15

This is no longer all that timely, but I landed here from a search starting from some people talking about FDT on Twitter. There are some major mistakes in this post and I feel the need to correct the record.

When you say:

> Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay!
> ... Clarifications: (a) I assume that the blackmailer is not infallible at predicting how you will react. Even if you follow FDT you might therefore find yourself in this situation.

It seems to me that this is leaning on an intuition that payoff-ratios should matter; ie, if the blackmailer threatens you with something that hurts you by $100k and demands $99k then you probably shouldn't pay, whereas if the blackmailer threatens you with something that hurts you by $100k and demands $1 then you should probably pay.

You then claim that FDT doesn't pay even in the $1-for-$100k case, even if the blackmailer is fallible. But this is incorrect. If the blackmailer is fallible--that is, if with some probability they blackmailed you unconditionally rather than conditioning based on anything about you--then FDT will pay if the demand is small enough. Not only that, given a quantitative description of exactly how fallible the blackmailer is, FDT provides a quantitative answer to how small is small enough!

Your next example, Procreation, seems to misunderstand what it means for two agents/decisions to be entangled. You write:

> Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there's a significant probability that I wouldn't exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate.

This situation is an impossibility because your father, if the situation is accurately described, would have chosen not to procreate. The scenario provides an argument why *you* should go your incentives and choose to procreate, but it fails to provide a symmetric argument for why *he* would make that choice; his own existence does not depend on the choice the way yours does.

So the situation is already a paradox even before we choose our decision theory; something about the decision your father faced must have been different. There are various possibilities about what hte difference could have been, but all of them remove the entanglement between the two decisions.

These are technical points. I think are important in diagnosing the disagreement, and having gotten these technical points wrong is *very concerning*, but the technicalities don't quite provide the information necessary to locate the core disagreement. I think the actual core of the disagreement is something pointed at by this paragraph:

> FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others. My CDT agent would still two-box in Newcomb's Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.

As I see it, decision-theory thought experiments are like unit tests. Take the decision theory, formalize it until you have a program where the input is a decision-theory thought experiment and the output is a strategy. Take the thought experiment, formalize it until you have a program where the input as a strategy and the output is a score. If the DT given a DTTE outputs a strategy which gets the highest score, it passes; if it outputs a strategy which gets a lower score, it fails.

(If you don't keep sight of the formalize-all-the-way-to-code criterion, it's easy to omit necessary details. Eg, the blackmailer isn't a perfect predictor, what do they do instead of correctly predicting and how often do they do that? If the evaluation depends on the unspecified details of a hypothetical blackmailer, a real agent would fill in its best guess, but we no longer have a unit test we can evaluate.)

Given the unit-test interpretation, it no longer matters whether the thought experiments are contrived and unrealistic; if we can find a unique decision theory passes every unit test, then we can infer that that decision theory will also do better in less-contrived scenarios.

Sure, you can bypass some of the problems in an incorrect decision theory by adding things to the utility function. But you don't need to do that. Doing that would mean abandoning the distinction between goals and strategies. It would mean abandoning having a standard by which to evaluate decision theories. And, most importantly, it would mean abandoning the goal of ever creating a provably-correct AGI.

I claim that FDT passes all unit tests, and other decision theories don't.

# on 13 October 2022, 08:16

@James: Thanks.

On Blackmail: Yes, depending on exactly how fallible the blackmailer is, FDT may recommend paying $1. But there is a possible scenario in which the blackmailer is highly but not perfectly reliable and where FDT says you shouldn't pay $1. Do you disagree with this? If not, where is the "very concerning" and "major" mistake?

On Procreation: You assume that your father did in fact face the same choice. That's not what I wrote. I wrote that you have reason to believe that your father faced the same choice. The situation is straightforward to analyse in CDT and EDT, without running into paradox. I still think what I say about FDT and Procreation is right, but I'm willing to be corrected. I'd have to see the proper application of FDT though, not some hand-wavy remarks about entanglement.

I don't see how all your talk about unit tests etc. is supposed to help FDT. For one thing, FDT is still only a sketch. We have fully precise formulations of CDT and EDT, but not of FDT. More importantly, someone needs to decide what counts as success or "passing". In my view, FDT fails all of the tests described in the post.

PS: Anecdotally, another reason why FDT is not gaining traction in academic philosophy is the hubris with which it is usually defended. Your comment is a good example. We academic philosophers appreciate discussions in which each side takes the other seriously and actually engages with their arguments, rather than declaring the other side stupid and misinterpreting their arguments so as to make them look weak.

# on 25 November 2022, 17:30

Someone linked this to me again so I wanted to share some curious hypothetical, to add to the Newcomb's paradox and the like.

Consider a redundant avionics system with 3 autopilots (identical to one another), feeding into a majority vote circuit. If there is a disagreement between all 3, the ideal action is to divert to the nearest airport.

The system is deliberately designed to eliminate causal effects of any one autopilot, because cosmic rays can cause bit flips in autopilots, which can cause that autopilot to output incorrect commands.

With CDT, the problem is obvious: you can prove that the other two autopilots are going to output identical values, you can prove that those values are not influenced by your choice, and it doesn't matter what value you output (other than in the event that they failed in which case you should divert).

You need some principled way to apply the hypothetical consequences of the action to outputs of other autopilots, but only as long as none of them are struck by a cosmic ray.

It can be argued that CDT simply can't get the plane to the destination. (The utilities are not exactly equal, though, there is a small probability that two other autopilots failed)

FDT would claim to solve the problem: when supposing that FDT outputs a command to turn on the aileron servo, it would also suppose that 2 other autopilots do the same. However, it is not temporarily restricted in any way. If it supposes that all autopilots in all similar past circumstance had issued a command to turn on the servo, if that is not something which actually happened in actual real world, then that would come in contradiction with the flight information obtained with the flight instruments, the position feedback data from the servo, etc.

(Note that the problem is very iterated here; repeating many times a second, across nearly identical circumstances).

So CDT I think would divert to the nearest airport while FDT would simply crash.

My thinking is that CDT, taken as strictly causal consequences, would "hard code" certain assumptions about the environment. Surprisingly enough, it is possible to build an electronic circuit - in our universe - that violates such assumptions. Not only is it possible, variations of such circuits are literally flying over our heads.

# on 26 November 2022, 12:07

Hi D!

Interesting. As usual, you're going a little too fast for me.

If the only relevant goal is to get to the destination, then why is the ideal action in case of a disagreement to divert to the nearest airport? At any rate, if this really is the ideal action in case of disagreement, then my choice makes a difference, even if the others output identical values: if the others agree not to deviate, and I say 'deviate', then the resulting action is not ideal (as per your stipulation); if, on the other hand, I also say 'go ahead', then the resulting action is ideal.

Here is one way of developing your case. In case of disagreement, it is best to deviate because a broken autopilot will do damage if it arrives at the destination. For CDT, we now need to evaluate what would be the case if the others said 'go' and I said 'deviate'. Would at least one of us be broken? Arguably yes. If so, CDT (with deliberation dynamics) recommends saying 'go'.

But there's another version of the story where CDT recommends saying 'deviate'. Here, what would cause damage at the destination are the cosmic rays: if cosmic rays have hit an autopilot, it becomes radioactive, and we don't want radioactivity at the destination. Plausibly, you wouldn't be radioactive if you were to say 'go' while the others say 'deviate'. In this version of the story, CDT says you should say 'deviate', because it dominates saying 'go'. The system of CDT autopilots will always deviate.

I'm not sure this is a problem. The system seems poorly designed. A better design would let each pilot make a decision on the pretense that the others aren't there. The system should then follow their individual recommendations if all three pilots agree, and deviate if they don't.

But let's change the scenario again so that it involves three real pilots, who happen to be in a plane that's operated by majority vote. They needn't be clones, but they reach their conclusion based on the same data, and all this is common knowledge. Intuitively, they should vote 'go', especially if the chance of cosmic infection is tiny, and/or the damage from cosmic infection would be low. EDT says they should say 'go', but CDT seems to say they should vote 'deviate'. This is, at first glance, perplexing. It also seems to be a Newcomb-type problem that is relatively easy to set up. Hmm.

I don't see why FDT would "crash". Suppose I'm confident that I and the other pilots obey FDT, unless hit by cosmic rays. I am certain that we all said 'go' in the last few rounds. We now need to evaluate what would be the case if FDT advised to deviate. This is a hypothesis of which I'm almost (or even entirely) sure that it's false, but we can evaluate counterfactuals with antecedents that are known to be false. In this case, presumably what would be the case if FDT advised to deviate is that we would all have said 'deviate' in the past.

It does seem crazy to me that one would evaluate whether or not to say 'deviate' by (only) looking at the consequences of this act in a counterfactual situation of which one is sure that it doesn't obtain. But that's the craziness of FDT.

Not sure what you mean by CDT hard coding assumptions that might be false. Could you elaborate?

# on 26 November 2022, 19:16

Sorry I wasn't clear, I should've been clearer about what the outputs of the "decider" are and the reason for the divert decision (although it seems that you have reconstructed my main points from a different perspective, perhaps complementary or better).

The decisions in question are meant to be not "go" and "divert" but servo commands for the aircraft. Turn the aileron 5 degrees in this direction. Or even lower level, "positive polarity to servo motor #13" or "positive step command to a stepper motor driver #3". ("Divert" is short for a particular sequence of commands that land the plane on the nearest airport)

The idea with crashing is that it is, basically, difficult to fly an airplane, and you have to pay attention to the instruments, and if you don't do that, you crash. And you have to apply counterfactuals to the model the right way, or else you crash.

Only a very, very very small fraction of all decisions made by autopilot leads to a non crash, and the autopilot has to find that decision in an enormous space of decisions that lead to a crash.

So, it is Newcomb's like, but the probability of getting it right al-la stuck clock being right twice a day, is negligible.

Suppose that we compute servo commands based on simulating the flight and maximization of some sort of utility function based on comfort, risk, costs, wasted passenger time - we try different servo commands and solve for the one with maximum "utility". (Of course, in practice you would take shortcuts like gradient descent optimization).

With the divert decision, my premise is that the risk of flying across the Atlantic with all 3 functioning autopilots is acceptable, but the risk of relying on 1 autopilot is unacceptable. (If two autopilots are in agreement, then the plane continues to the destination and gets serviced there).

When we try to simulate effects of issuing a servo command to e.g. turn an aileron, the position of the physical servo is not affected by individual autopilot's output, because it gets out voted. So in the overwhelmingly probable possibility where the other 2 autopilots operate just fine, we are unable to compute correct servo commands, because utilities are identical for all servo commands.

There is still the low-probability possible world where the other autopilots have failed electrically, but in that world I postulate that the optimal action is different (divert), which would mean that the servo commands are computed which amount to diverting the flight.

(Of course, if we have no concern about flying without redundancy, then the autopilot can get to the destination by a slight utility difference that comes from the probability other two failed.)

You're completely right that we can just implement a simplified model of physics - without the remaining 2 autopilots - and make it work. Also we consider it not be safe enough to fly far on one autopilot, so if we build an incorrect world model where there's just 1 autopilot, then we also have to have a workaround for that decision as well. Our decision to fly or divert is dependent on whether there's 1 autopilot or 3. (For 2 autopilots, any disagreement with one remaining autopilot means you need to divert)

It seems to me though that if we build such an autopilot with those shortcuts, it is an approximation to some yet-to-be-described decision theory. And it could benefit from some theoretical understanding of what it is that we are approximating.

Another thing is that if we e.g. use machine learning to make our autopilot, it can learn all of that without having to be ignorant of the existence of 2 other autopilots and the majority vote circuit.

What if we want to optimize for some combination of cost, risk, wasted person hours, and discomfort? With flying on 1 autopilot being much riskier than the two.

My general idea here is to try to be a bit more descriptivist rather than prescriptivist, by trying to come up with a higher level principle that encompasses sensible autopilots.

The reason why I think FDT would crash (at least the version that would onebox on transparent Newcomb's), is in the context where FDT has to actually calculate correct servo angles by using flight instruments and applying correct model of what's gonna happen next.

If it does the same thing it does in Newcomb's, namely starts applying those counterfactual commands to the past, and hallucinating god knows what to square this counterfactual with the present instrument readouts... granted we can't prove that it will crash, but that seems to be like proving that someone who never tried to play a violin can't play like Paganini.

> Not sure what you mean by CDT hard coding assumptions that might be false. Could you elaborate?

The dominoes toppling other dominoes model of causality (as used for applying hypothetical actions - we push the domino and it influences other dominoes that it hits). To me it seems to be a belief about physics that can be correct or incorrect (or almost correct but subtly incorrect) and was originally determined by either human experience or by evolution, from pure correlational evidence (EDT-like).

edit: another clarification.

My hypothetical is primarily about what happens during a routine flight where all 3 autopilots operate correctly.

If needed we can specify exactly what happens for all possible failure combinations. An autopilot can either produce an output, or fail a self test.

E.g. 2 autopilots agree, third disagrees or fails the self test: majority vote, risk is low enough to continue across the Atlantic.

2 autopilots fail internal self test, third self tests OK: divert, using the remaining autopilot.

3 autopilots all disagree: a flight attendant has to fly the plane. Alternatively, some heuristic is used to choose one autopilot. 3 autopilots fail internal self test: a flight attendant has to fly the plane.

# on 26 November 2022, 21:09

Also, to try to clarify and TL/DR it.

The first thing in the hypothetical is how do we calculate utilities across possible actions (possible servo commands) during a routine flight with no electrical faults of any kind.

With CDT and considering one individual autopilot, it knows that its likely that other two autopilots output identical value (i.e. to eachother). Calculated utilities in this scenario are then equal for all servo commands, so it can not choose the right servo commands for flying the plane correctly to its original destination.

edit: to be clear, it may make a binary decision to proceed if that was available. The alternatives of turning left or turning right, however, yield the same turn - the one commanded by the other two autopilots - and thus the same utility.

Suppose that in every failure scenario where the autopilot in question is in command of the airplane (not being outvoted by the other two), there is an unacceptably high probability that the autopilot in command will also fail, so it must minimize flight time in that condition, by landing ASAP.

Intuitively, the right thing to do is to pretend that your action also controls commands by other two autopilots, but also not to extend this logic where it is inappropriate (e.g. to past timesteps).

# on 27 November 2022, 20:12


I agree that your case, like my version of it, is one where there's some intuitive pull to pretend that your action controls the plane alone (or that it controls the other autopilots). EDT yields this verdict, CDT does not.

I see now why you think FDT probably does even worse. The worry, in general, is that FDT evaluates the available acts by looking at their payoff in situations with a different past and present. If the actual situation calls for acts that are finely tuned to the actual past and present, this is likely to go wrong. Would be nice to have a simple toy scenario where this hunch can be confirmed.

I still think that what CDT says requires some more input about the utilities and the causal structure than what you have specified, but this probably doesn't affect the main point.

I'm not sure if arbitrary AI systems should be programmed to follow CDT -- even setting aside that CDT is computationally intractable. If a system is known to interact with others in a certain way, then it might be better regarded as a subsystem of a larger agent, and we should focus on optimising the larger agent. All sorts of voting paradoxes suggest that it would be a bad idea to design the larger agent by aggregating the preferences of autonomous sub-agents. But there is still a problem about what an autonomous agent should do if it finds itself taking part in such a badly designed larger system.

I probably agree that the causal assumptions in CDT need to be questioned. Fortunately, there are ways of spelling out CDT that are compatible with skepticism about causation as a primitive external force. Skyrms did this in his two books from the 1980s. Lewis also had a very deflationist account of causation and of whatever plays its role in CDT. So I don't think we're necessarily hard-coding assumptions that may well be false.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.