## On Functional Decision Theory

I recently refereed Eliezer Yudkowsky and Nate Soares's "Functional Decision Theory" for a philosophy journal. My recommendation was to accept resubmission with major revisions, but since the article had already undergone a previous round of revisions and still had serious problems, the editors (understandably) decided to reject it. I normally don't publish my referee reports, but this time I'll make an exception because the authors are well-known figures from outside academia, and I want to explain why their account has a hard time gaining traction in academic philosophy. I also want to explain why I think their account is wrong, which is a separate point.

I actually don't know Nate Soares, but Eliezer Yudkowsky is a celebrity in the "rationalist" community. Many of his posts on the Less Wrong blog are gems. I also enjoyed his latest book, Inadequate Equilibria. Yudkowsky seems to be interested in almost everything, but he regards decision theory as his main area of research. I also work in decision theory, but I've always struggled with Yudkowsky's writings on this topic.

Before I explain what I found wrong with the paper, let me review the main idea and motivation behind the theory it defends.

Standard lore in decision theory is that there are situations in which it would be better to be irrational. Three examples.

Blackmail. Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy's gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy's interest to blackmail Donald. If Donald were irrational, he would blow Stormy's gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.
Prisoner's Dilemma with a Twin. Twinky and her clone have been arrested. If they both confess, each gets a 5 years prison sentence. If both remain silent, they can't be convicted and only get a 1 year sentence for obstructing justice. If one confesses and the other remains silent, the one who confesses is set free and the other gets a 10 year sentence. Neither cares about what happens to the other. Here, confessing is the dominant act and the unique Nash equilibrium. So if Twinky and her clone are rational, they'll each spend 5 years in prison. If they were irrational and remained silent, they would get away with 1 year.
Newcomb's Problem with Transparent Boxes. A demon invites people to an experiment. Participants are placed in front of two transparent boxes. The box on the left contains a thousand dollars. The box on the right contains either a million or nothing. The participants can choose between taking both boxes (two-boxing) and taking just the box on the right (one-boxing). If the demon has predicted that a participant one-boxes, she put a million dollars into the box on the right. If she has predicted that a participant two-boxes, she put nothing into the box. The demon is very good at predicting, and the participants know this. Each participant is only interested in getting as much money as possible. Here, the rational choice is to take both boxes, because you are then guaranteed to get $1000 more than if you one-box. But almost all of those who irrationally take just one box end up with a million dollars, while most of those who rationally take both boxes leave with$1000.

The driving intuition behind Yudkowsky and Soares's paper is that decision theorists have been wrong about these (and other) cases: in each case, the supposedly irrational choice is actually rational. Whether a pattern of behaviour is rational, they argue, should be measured by how good it is for the agent. In Newcomb's Problem with Transparent Boxes, one-boxers fare better than two-boxers. So we should regard one-boxing as rational. Similarly for the other examples. Standard decision theories therefore get these cases wrong. We need a new theory.

Functional Decision Theory (FDT) is meant to be that theory. FDT recommends blowing the gaff in Blackmail, remaining silent in Prisoner's Dilemma with a Twin, and one-boxing in Newcomb's Problem with Transparent Boxes.

Here's how FDT works, and how it differs from the most popular form of decision theory, Causal Decision Theory (CDT). Suppose an agent faces a choice between two options A and B. According CDT, the agent should evaluate these options in terms of their possible consequences (broadly understood). That is, the agent should consider what might happen if she were to choose A or B, and weigh the possible outcomes by their probability. In FDT, the agent should not consider what would happen if she were to choose A or B. Instead, she ought to consider what would happen if the right choice according to FDT were A or B.

Take Newcomb's Problem with Transparent Boxes. Without loss of generality, suppose you see $1000 in the left box and a million in the right box. If you were to take both boxes, you would get a million and a thousand. If you were to take just the right box, you would get a million. So Causal Decision Theory says you should take box boxes. But let's suppose you follow FDT, and you are certain that you do. You should then consider what would be the case if FDT recommended one-boxing or two-boxing. These hypotheses are not hypotheses just about your present choice. If FDT recommended two-boxing, then any FDT agent throughout history would two-box. And, crucially, the demon would (probably) have foreseen that you would two-box, so she would have put nothing into the box on the right. As a result, if FDT recommended two-boxing, you would probably end up with$1000. To be sure, you know that there's a million in the box on the right. You can see it. But according to FDT, this is irrelevant. What matters is what would be in the box relative to different assumptions about what FDT recommends.

To spell out the details, one would now need to specify how to compute the probability of various outcomes under the subjunctive supposition that FDT recommended a certain action. Yudkowsky and Soares are explicit that the supposition is to be understood as counterpossible: we need to suppose that a certain mathematical function, which in fact outputs A for input X, instead were to output B. They do not explain how to compute the probability of outcomes under such a counterpossible supposition. So we don't get any details spelled out. This is flagged as the main open question for FDT.

It is not obvious to me why Yudkowsky and Soares choose to model the relevant supposition as a mathematical falsehood. For example, why not let the supposition be: I am the kind of agent who chooses A in the present decision problem? That is an ordinary contingent (centred) propositions, since there are possible agents who do choose option A in the relevant problem. These agents may not follow FDT, but I don't see why that would matter. For some reason, Yudkowsky and Soares assume that an FDT agent is certain that she follows FDT, and this knowledge is held fixed under all counterfactual suppositions. I guess there is a reason for this assumption, but they don't tell us.

Anyway. That's the theory. What's not to like about it?

For a start, I'd say the theory gives insane recommendations in cases like Blackmail, Prisoner's Dilemma with a Twin, and Newcomb's Problem with Transparent Boxes. Take Blackmail. Suppose you have committed an indiscretion that would ruin you if it should become public. You can escape the ruin by paying $1 once to a blackmailer. Of course you should pay! FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn't on the table. It's not something you can choose. Admittedly, that's not much of an objection. I say you'd be insane not to pay the$1, Yudkowsky and Soares say you'd be irrational to pay. Neither of us can prove that their judgement is right from neutral premises.

What about the fact that FDT agents do better than (say) CDT agents? I admit that if this were a fact, it would be somewhat interesting. But it's not clear if it is true.

First, it depends on how success is measured. If you face the choice between submitting to blackmail and refusing to submit (in the kind of case we've discussed), you fare dramatically better if you follow CDT than if you follow FDT. If you are in Newcomb's Problem with Transparent Boxes and see a million in the right-hand box, you again fare better if you follow CDT. Likewise if you see nothing in the right-hand box.

So there's an obvious sense in which CDT agents fare better than FDT agents in the cases we've considered. But there's also a sense in which FDT agents fare better. Here we don't just compare the utilities scored in particular decision problems, but also the fact that FDT agents might face other kinds of decision problems than CDT agents. For example, FDT agents who are known as FDT agents have a lower chance of getting blackmailed and thus of facing a choice between submitting and not submitting. I agree that it makes sense to take these effects into account, at least as long as they are consequences of the agent's own decision-making dispositions. In effect, we would then ask what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime. Even then, however, there is no guarantee that FDT would come out better. What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.

Moreover, FDT does not in fact consider only consequences of the agent's own dispositions. The supposition that is used to evaluate acts is that FDT in general recommends that act, not just that the agent herself is disposed to choose the act. This leads to even stranger results.

Procreation. I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there's a significant probability that I wouldn't exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

In Procreation, FDT agents have a much worse life than CDT agents.

All that said, I agree that there's an apparent advantage of the "irrational" choice in cases like Blackmail or Prisoner's Dilemma with a Twin, and that this raises an important issue. The examples are artificial, but structurally similar cases arguably come up a lot, and they have come up a lot in our evolutionary history. Shouldn't evolution have favoured the "irrational" choices?

Not necessarily. There is another way to design agents who refuse to submit to blackmail and who cooperate in Prisoner Dilemmas. The trick is to tweak the agents' utility function. If Twinky cares about her clone's prison sentence as much as about her own, remaining silent becomes the dominant option in Prisoner's Dilemma with a Twin. If Donald develops a strong sense of pride and would rather take Stormy down with him than submit to her blackmail, refusing to pay becomes the rational choice in Blackmail.

FDT agents rarely find themselves in Blackmail scenarios. Neither do CDT agents with a vengeful streak. If I wanted to design a successful agent for a world like ours, I would build a CDT agent who cares what happens to others. My CDT agent would still two-box in Newcomb's Problem with Transparent Boxes (or in the original Newcomb Problem). But this kind of situation practically never arises in worlds like ours.

The story I'm hinting at has been well told by others. I'd especially recommend Brian Skyrms's Evolution of the Social Contract and chapter 6 of Simon Blackburn's Ruling Passions.

So here's the upshot. Whether FDT agents fare better than CDT agents depends on the environment, on how "faring better" is measured, and on what the agents care about. Across their lifetime, purely selfish agents might do better, in a world like ours, if they followed FDT. But that doesn't persuade me that the insane recommendations FDT are correct.

So far, I have explained why I'm not convinced by the case for FDT. I haven't explained why I didn't recommend the paper for publication. That I'm not convinced is not a reason. I'm rarely convinced by arguments I read in published papers.

The standards for deserving publication in academic philosophy are relatively simple and self-explanatory. A paper should make a significant point, it should be clearly written, it should correctly position itself in the existing literature, and it should support its main claims by coherent arguments. The paper I read sadly fell short on all these points, except the first. (It does make a significant point.)

Here, then, are some of the complaints from my referee report, lightly edited for ease of exposition. I've omitted several other complaints concerning more specific passages or notation from the paper.

1. A popular formulation of CDT assumes that to evaluate an option A we should consider the probability of various outcomes on the subjunctive supposition that A were chosen. That is, we should ask how probable such-and-such an outcome would be if option A were chosen. The expected utility of the option is then defined as the probability-weighted average of the utility of these outcomes. In much of their paper, Yudkowsky and Soares appear to suggest that this is exactly how expected utility is defined in FDT. The disagreement between CDT and FDT would then boil down to a disagreement about what is likely to be the case under the subjunctive supposition that an option is chosen.

For example, consider Newcomb's Problem with Transparent Boxes. Suppose (without loss of generality) that the right-hand box is empty. CDT says you should take both boxes because if you were to take only the right-hand box you would get nothing whereas if you were to take both boxes, you would get $1000. According to FDT (as I presented it above, and as it is presented in parts of the paper), we should ask a different question. We should ask would be the case if FDT recommended one-boxing, and what would be the case if FDT recommended two-boxing. For much of the paper, however, Yudkowsky and Soares seem to assume that these questions coincide. That is, they suggest that you should one-box because if you were to one-box, you would get a million. The claim that you would get nothing if you were to one-box is said to be a reflection of CDT. If that's really what Yudkowsky and Soares want to say, they should, first, clarify that FDT is a special case of CDT as conceived for example by Stalnaker, Gibbard & Harper, Sobel, and Joyce, rather than an alternative. All these parties would agree that the expected utility of an act is a matter of what would be the case if the act were chosen. (Yudkowsky and Soares might then also point out that "Causal Decision Theory" is not a good label, given that they don't think the relevant conditionals track causal dependence. John Collins has made essentially the same point.) Second, and more importantly, I would like to see some arguments for the crucial claim about subjunctive conditionals. Return once more to the Newcomb case. Here's the right-hand box. It's empty. It's a normal box. Nothing you can do has any effect on what's in the box. The demon has tried to predict what you will do, but she could be wrong. (She has been wrong before.) Now, what would happen if you were to take that box, without taking the other one? The natural answer, by the normal rules of English, is: you would get an empty box. Yudkowsky and Soares instead maintain that the correct answer is: you would find a million in the box. Note that this is a claim about the truth-conditions of a certain sentence in English, so facts about the long-run performance of agents in decision problems don't seem relevant. (If the predictor is highly reliable, I think a "backtracking" reading can become available on which it's true that you would get a million, as Terry Horgan has pointed out. But there's still the other reading, and it's much more salient if the predictor is less reliable.) Third, later in the paper it transpires that FDT can't possibly be understood as a special case of CDT along the lines just suggested because in some cases FDT requires assessing the expected utility of an act by looking exclusively at scenarios in which that act is not performed. For example, in Blackmail, not succumbing is supposed to be better because it decreases the chance of being blackmailed. But any conditional of the form if the agent were to do A, then the agent would do A is trivially true in English. Fourth, in other parts of the paper it is made clear that FDT does not instruct agents to suppose that a certain act were performed, but rather to suppose that FDT always were to give a certain output for a certain input. I would recommend dropping all claims about subjunctive conditionals involving the relevant acts. The proposal should be that the expected utility of act A in decision problem P is to be evaluated by subjunctively supposing not A, but the proposition that FDT outputs A in problem P. (That's how I presented the theory above.) The proposal then wouldn't rely on implausible and unsubstantiated claims about English conditionals. [I then listed several passages that would need to be changed if the suggestion is adopted.] 2. I'm worried that so little is said about how subjunctive probabilities are supposed to be revised when supposing that FDT gives a certain output for a certain decision problem. Yudkowsky and Soares insist that this is a matter of subjunctively supposing a proposition that's mathematically impossible. But as far as I know, we have no good models for supposing impossible propositions. Here are three more specific worries. First, mathematicians are familiar with reductio arguments, which appear to involve impossible suppositions. "Suppose there were a largest prime. Then there would be a product x of all these primes. And then x+1 would be prime. And so there would be a prime greater than all primes." What's noteworthy about these arguments is that whenever B is mathematically derivable from A, then mathematicians are prepared to accept 'if A were the case then B would be the case', even if B is an explicit contradiction. (In fact, that's where the proof usually ends: "If A were the case then a contradiction would be the case; so A is not the case.") If that is how subjunctive supposition works, FDT is doomed. For if A is a mathematically false proposition, then anything whatsoever mathematically follows from A. (I'm ignoring the subtle difference between mathematical truth and provability, which won't help.) So then anything whatsoever would be the case on a counterpossible supposition that FDT produces a certain output for a certain decision problem. We would get: If FDT recommended two-boxing in Newcomb's Problem, then the second box would be empty, but also If FDT recommended two-boxing in Newcomb's Problem, then the second box would contain a million, and If FDT recommended two-boxing in Newcomb's Problem, the second box would contain a round square. A second worry. Is a probability function revised by a counterpossible supposition, as employed by FDT, still a probability function? Arguably not. For presumably the revised function is still certain of elementary mathematical facts such as the Peano axioms. (If, when evaluating a relevant scenario, the agent is no longer sure whether 0=1, all bets are off.) But some such elementary facts will logically entail the negation of the supposed hypothesis. So in the revised probability function, probability 1 is not preserved under logical entailment; and then the revised function is no longer a classical probability function. (This matters, for example, because Yudkowsky and Soares claim that the representation theorem from Joyce's Foundations of Causal Decision Theory can be adapted to FDT, but Joyce's theorem assumes that the supposition preserves probabilistic coherence.) Another worry. Subjunctive supposition is relatively well-understood for propositions about specific events at specific times. But the hypothesis that FDT yields a certain output for a certain input is explicitly not spatially and temporally limited in this way. We have no good models for how supposing such general propositions works, even for possible propositions. The details matter. For example, assume FDT actually outputs B for problem P, and B' for a different problem P'. Under the counterpossible supposition that FDT outputs A for P, can we hold fixed that it outputs B' for P'? If not, FDT will sometimes recommend choosing a particular act because of the advantages of choosing a different act in a different kind of decision problem. 3. Standard decision theories are not just based on brute intuitions about particular cases, as Yudkowsky and Soares would have us believe, but also on general arguments. The most famous of these are so-called representation theorems which show that the norm of maximising expected utility can be derived from more basic constraints on rational preference (possibly together with basic constraints on rational belief). It would be nice to see which of the preference norms of CDT Yudkowsky and Soares reject. It would also be nice if they could offer a representation theorem for FDT. All that is optional and wouldn't matter too much, in my view, except that Yudkowsky and Soares claim (as I mentioned above) that the representation theorem in Joyce's Foundations of Causal Decision Theory can be adapted straightforwardly to FDT. But I doubt that it can. The claim seems to rest on the idea that FDT can be formalised just like CDT, assuming that subjunctively supposing A is equivalent to supposing that FDT recommends A. But as I've argued above, the latter supposition arguably makes an agent's subjective probability function incoherent. More obviously, in cases like Blackmail, A is plausibly false on the supposition that FDT recommends A. These two aspects already contradict the very first two points in the statement of Joyce's representation theorem, on p.229 of The Foundations of Causal Decision Theory, under 7.1.a. 4. Yudkowsky and Soares constantly talk about how FDT "outperforms" CDT, how FDT agents "achieve more utility", how they "win", etc. As we saw above, it is not at all obvious that this is true. It depends, in part, on how performance is measured. At one place, Yudkowsky and Soares are more specific. Here they say that "in all dilemmas where the agent's beliefs are accurate [??] and the outcome depends only on the agent's actual and counterfactual behavior in the dilemma at hand -- reasonable constraints on what we should consider "fair" dilemmas -- FDT performs at least as well as CDT and EDT (and often better)". OK. But how we should we understand "depends on ... the dilemma at hand"? First, are we talking about subjunctive or evidential dependence? If we're talking about evidential dependence, EDT will often outperform FDT. And EDTers will say that's the right standard. CDTers will agree with FDTers that subjunctive dependence is relevant, but they'll insist that the standard Newcomb Problem isn't "fair" because here the outcome (of both one-boxing and two-boxing) depends not only on the agent's behavior in the present dilemma, but also on what's in the opaque box, which is entirely outside her control. Similarly for all the other cases where FDT supposedly outperforms CDT. Now, I can vaguely see a reading of "depends on ... the dilemma at hand" on which FDT agents really do achieve higher long-run utility than CDT/EDT agents in many "fair" problems (although not in all). But this is a very special and peculiar reading, tailored to FDT. We don't have any independent, non-question-begging criterion by which FDT always "outperforms" EDT and CDT across "fair" decision problems. 5. FDT closely resembles Justin Fisher's "Disposition-Based Decision Theory" and the proposal in David Gauthier's Morals by Agreement, both of which are motivated by cases like Blackmail and Prisoner's Dilemma with a Twin. Neither is mentioned. It would be good to explain how FDT relates to these earlier proposals. 6. The paper goes to great lengths criticising the rivals CDT and EDT. The apparent aim is to establish that both CDT and EDT sometimes make recommendations that are clearly wrong. Unfortunately, these criticisms are largely unoriginal, superficial, or mistaken. For example, Yudkowsky and Soares fault EDT for giving the wrong verdicts in simple medical Newcomb problems. But defenders of EDT such as Arif Ahmed and Huw Price have convincingly argued that the relevant decision problems would have to be highly unusual. Similarly, Yudkowsky and Soares cite a number of well-known cases in which CDT supposedly gives the wrong verdict, such as Arif's Dicing with Death. But again, most CDTers would not agree that CDT gets these cases wrong. (See this blog post for my response to Dicing with Death.) In general, I am not aware of any case in which I'd agree that CDT -- properly spelled out -- gives a problematic verdict. Likewise, I suspect Arif does not think there are any cases in which EDT goes wrong. It just isn't true that both CDT and EDT are commonly agreed to be faulty. If Yudkowsky and Soares want to argue that they are, they need to do more than revisit well-known scenarios and make bold assertions about what CDT and EDT say about them. The criticism of CDT and EDT also contains several mistakes. For example, Yudkowsky and Soares repeatedly claim that if an EDT agent is certain that she will perform an act A, then EDT says she must perform A. I don't understand why. I guess the idea is that (1) if P(B)=0, then the evidential expected utility of B is undefined, and (2) any number is greater than undefined. But lots of people, from Kolmogoroff to Hajek, have argued against (1), and I don't know why anyone would find (2) plausible. For another example, Yudkowsky and Soares claim that CDT (like FDT) involves evaluating logically impossible scenarios. For example, "[CDTers] are asking us to imagine the agent's physical action changing while holding fixed the behavior of the agent's decision function". Who says that? I would have thought that when we consider what would happen if you took one box in Newcomb's Problem, the scenario we're considering is one in which your decision function outputs one-boxing. We're not considering an impossible scenario in which your decision function outputs two-boxing, you have complete control over your behaviour, and yet you choose to one-box. There are many detailed formulations of CDT. Yudkowsky and Soares ignore almost all of them and only mention the comparatively sketchy theory of Pearl. But even Pearl's theory plausibly doesn't appeal to impossible propositions to evaluate ordinary options. Lewis's or Joyce's or Skyrms's certainly doesn't. I still think the paper could probably have been published after a few rounds of major revisions. But I also understand that the editors decided to reject it. Highly ranked philosophy journals have acceptance rates of under 5%. So almost everything gets rejected. This one got rejected not because Yudkowsky and Soares are outsiders or because the paper fails to conform to obscure standards of academic philosophy, but mainly because the presentation is not nearly as clear and accurate as it could be. ### Comments # on 19 January 2019, 03:36 Notice that all the situations (one excepted) described in this post where FDT does worse than CDT are logically impossible, while the situations where FDT does better are not. Or maybe we’re using the versions of the problems where the blackmailer is not entirely predictable and might still blackmail the functional decision theorist (but be more likely to blackmail the causal decision theorist), or where the Newcomb predictor is not a perfect predictor but only very likely to predict correctly, or where the other prisoner twin might be hit by a cosmic ray with low probability and not make the same decision as you. If so, situations where CDT does better than FDT are less likely than situations where FDT does better, so FDT still comes out ahead. Let’s assume that we’re using the deterministic version of each of these problems, rather than the probabilistic version: the blackmailer is guaranteed to know what decision theory you use and to act accordingly, the Newcomb predictor is guaranteed to predict correctly, your twin is guaranteed to make the same prediction as you, your father is guaranteed to procreate if and only if you do. Now let’s consider the blackmail problem. The post says, “If you face the choice between submitting to blackmail and refusing to submit (in the kind of case we’ve discussed), you fare dramatically better if you follow CDT than if you follow FDT.” This is true. The problem is that, if you are being blackmailed, this means that you are not going to follow FDT. If you were going to follow FDT, the blackmailer would not have blackmailed you. The fact that you have been blackmailed means you can be 100% certain that you will not follow FDT. In itself, being 100% certain that you will not follow FDT does not prevent you from following FDT. But it does make the situation where you follow FDT and come worse off impossible, which is relevant to our determination of which decision theory is better. Let’s consider the Newcomb problem. If the Newcomb predictor is guaranteed to predict your choice correctly, it is impossible for an agent using CDT to see a million in the right-hand box. It never does any good to dismiss a logical inconsistency and to consider what happens anyway. What happens if we ignore this and suppose that the CDT agent does see a thousand in the left-hand box and a million in the right-hand box? Then using this supposition we can prove that they will get both amounts if they two-box. But since they are a CDT agent, we know that they will two-box, therefore there is nothing in the right-hand box, so we can prove that they will only get a thousand if they two-box. But suppose that they one-box instead. Since they are a CDT agent, we know that they will two-box, so we know that there is nothing in the right-hand box, so we can prove that if they one-box they will get nothing. However, we know that they see a million in the right-hand box, so we can prove that if they one-box, they will get a million. So we can prove that they should one-box, and we can prove that they should two-box. At this point we can conclude that a million and nothing are the same thing, and that a thousand is equal to a million plus a thousand. Avec des si, on mettrait Paris en bouteille. The procreation example is harder to prove inconsistent because it relies on infinite regress. Here’s a first way to resolve it. Should I procreate? If I do, my life will be miserable. But my father followed the same decision theory I do, so if I choose not to procreate, that means my father will have chosen not to procreate. So I will not exist. So I can prove that, if I end up choosing not to procreate, that means I do not exist. However, I do exist. That’s a contradiction. I guess that means I will not choose not to procreate. Knowing that I will not make that choice does not in itself prevent me from making the choice though. Should I choose not to procreate anyway? Well, I can prove that if I do not procreate, then I will not exist, and that if I do, then my life will be miserable. A miserable life is better than not existing, so I should procreate. However, I know that I exist, and that is the consequent of the implication “if I do not procreate, then I [will] exist”, so the implication is true, whereas if I choose to procreate I still exist but my life is miserable. A miserable life is worse than a non-miserable life, so I should not procreate. Oops, I can prove that I should procreate and that I should not procreate? That’s a contradiction, and this one doesn’t rely on the supposition that I made any particular choice. The world I am living in must be inconsistent. We can also solve it by directly addressing the infinite regress. Should I procreate? If I do, my life will be miserable. But my father followed the same thought process I did, would have made the same decisions, so if I choose not to procreate, that means my father will have chosen not to procreate. Then I would not exist, and a miserable life is better than not existing, so I should procreate. Why did my father procreate, though, if that made his life miserable? Oh, right. My grandfather followed the same thought process that my father did, so if he chose not to procreate, that means his father would have chosen not to procreate, and so he would not exist either. Since he too considered a miserable life better than not existing, he chose to procreate. Why did my grandfather procreate, though, if that made his life miserable? What about my great-grandfather? What about— The recursive buck stops *here*. My {The Recursive Buck Stops Here}-great-. . .-great-grandfather did not choose to procreate because that would have made his life miserable. Therefore I do not exist. That’s a contradiction. The assumption that each generation of ancestry uses FDT and only exists if the previous chose to procreate is inconsistent with the assumption that any of them exist. No FDT agent can ever face this problem, and no designer can ever have to pick a decision theory for an agent that could have to face this problem. And if we only assume that it is unlikely that the father made a different decision from you, and not that it is certain that he did not, then FDT makes it less likely that you will not exist, and so it again comes out ahead of CDT. There is one category of situations (the one exception I mentioned) where FDT can leave you worse off than CDT, and that is what happens when “someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options”. FDT can change your decisions to make them optimal, but it can’t change the initial decision theory you used to make the decisions. It can only pick decisions identical to those of another decision theory. That doesn’t prevent an environment from knowing what your initial decision theory was and punishing you on that basis. This is unsolvable by any decision theory. Therefore it can hardly be taken as a point against FDT. I said that it never does any good to dismiss a logical inconsistency. I want to clarify that this is not the same as saying that we should dismiss thought experiments because their premises are unlikely. “Extremism In Thought Experiment Is No Vice”. Appealing to our intuitions about extreme cases is informative. But logical impossibility is informative too, and is what we care about when comparing decision theories. Nate Soares has claimed “that *all* decision-making power comes from the ability to induce contradictions: the whole reason to write an algorithm that loops over actions, constructs models of outcomes that would follow from those actions, and outputs the action corresponding to the highest-ranked outcome is so that it is contradictory for the algorithm to output a suboptimal action.” # on 21 January 2019, 12:52 @artifax: all the situations described in the post were meant to be "non-deterministic" in your sense, so there's nothing impossible about CDT agents outperforming FDT agents in these cases. I see that on a population-level statistical average, purely selfish FDT agents often do better than purely selfish CDT agents. I said as much in the post, so I don't think we disagree here. Except that I don't think average population-level success among selfish agents is an adequate test for the right decision theory. A somewhat more adequate test, I think, is to look at which theory gives better results across a wide range of decision problems, no matter how these problems came about. On that measure, selfish CDT agents generally do better than selfish FDT agents. But of course I can't prove to you that my test is more adequate. # on 21 January 2019, 17:25 PS: In situations like Procreation (the "non-deterministic" case I described), I don't even see a meaningful statistical sense in which FDT agents do better than CDT agents. # on 07 March 2019, 18:34 Thanks for sharing this. I've always had a lot of admiration for the LW community. I think your complaints are good ones, though. I wanted to call attention to a relevant and underappreciated paper by John Leslie: "Ensuring Two Bird Deaths with One Throw" (Mind, 1991) <jstor.org/stable/2254984>. If you have a perfect clone... then by killing a bird with a stone, you ensure that your clone does likewise. Leslie calls this phenomenon "quasi-causation" and applies it to Newcomb's Problem, among other issues. # on 13 March 2020, 17:07 Good job, Wolfgang. I agreed with almost everything you wrote. But I have a different interpretation on the 3 paradoxes. "Standard lore in decision theory is that there are situations in which it would be better to be irrational." -- I think should be changed to "...to be considered (by other agents) to be irrational". That's an important difference, for the agent can be rational, but still be considered/believed by other agent to be irrational. Clearly applies to the case of blackmail paradox, but also to the Newcomb's problem. In the latter, it would be good (for the agent faced with the choice) if the predictor/demon considers the agent to be irrational, and thus predict that it will choose 1 box. But after boxes are already set up, the agent, rationally, should choose both boxes, as his decision cannot influence the past decision of the demon what to put inside boxes. This is similar to how in real life, it's sometimes convenient for a person to be considered fool by others (and even pretend to be fool), while in reality be much smarter. But for Prisoner's Dilemma with a Twin: First of all, should not the relevant question/conclusion be what ONE particular agent should decide, as opposed to 2 of them? I suggest the analysis should compare between Twinky being rational being irrational (and then split each situation based on how his tween is). Second, why do you think that being irrational means making a particular choice (here, to remain silent? I'd say, being rational means analyzing and taking a decision; being irrational, means not being capable of that, which means taking a random decision (which can even happen to be the same as what a rational agent takes). With this modification, my analysis gives, for Blackmail's problem, that being rational is better here. # on 13 March 2020, 17:37 Thanks Vic. You're certainly right that in the examples it is better to be considered irrational. But I think there's also a sense in which it would be better for the relevant agents if they actually were (and had always been) irrational. For example, the predictor would then very likely have put a million into the first box. I agree, of course, that the agent should choose both boxes. And yes, in the Twin Dilemma, the right question to ask is what Twinky should do -- and the answer is that she should confess. Still, when we ask how different kinds of people fare in this scenario, those who remain silent fare better, and by hypothesis they make the same choice as their twin. I did assume being rational in the examples I gave involves making a particular choice (but not remaining silent! The rational choice is to confess). I agree rationality involves more. But I don't agree the alternative to rationality is to act randomly. For example, an agent who always minimizes expected utility would act irrationally, but not randomly. # on 14 March 2020, 05:45 Thanks for replying. By the way, I should have mentioned that I am not here to defend functional decision theory: I haven't even read that; nor have I ever communicated with it's authors. I see what you mean with rational agents fairing worse than irrational in the Newcomb's problem. For now, I tend to agree. In twin dilemma: from this sentence in the article: "If one confesses and the other remains silent[...]" , I thought you meant that they can choose differently; but now I see that you mean they actually cannot. And I take back my idea that being irrational is to act randomly. I'll get back to these examples later; but for now, could we clarify/ agree on some definitions? rational choice/decision = choice taken by agent that results in BEST probable outcome for this agent; let's say, maximizes expected utility irrational choice = choice taken by agent that results in WORST probable outcome for this agent; let's say, minimizes expected utility rational agent = agent that in every situation makes the rational choice irrational agent = agent that in every situation makes the irrational choice partially (ir)rational agent = agent that in a certain fraction of situations ( >0 and <1 ) makes the (ir)rational choice If so, then an agent either is rational or is not rational (latter meaning irrational or partially rational): it cannot CHOOSE to be rational (instead, can only choose the kind of choice to make). Agent A may or may not know the rationality type of agent B; in general, may know it only with a certain degree of accuracy (or better said, may believe that, and the belief has a certain degree of accuracy) , say measured using probability. Similarly, in a given situation, agent A may or may not know, or believe with some accuracy, what choice agent B can take. What agent A believes about agent B's rationality, or type of choice it will take in a given situation, is independent proposition from what the rationality of B actually is, or , respectively, choice B will take. # on 14 March 2020, 06:06 It seems to me that, using clear definitions and systematic methods, we could shine good light on these and other paradoxes, and perhaps even remove the "paradox" or "unsolved" status of many of them. # on 14 March 2020, 11:46 Hi Vic, I agree that clear definitions are useful here, and can help to make the apparent paradoxes less puzzling. # on 14 March 2020, 18:39 After a little thought, a generalization for the: partially (ir)rational agent = agent that is neither rational or irrational. Thus, in some decision problems, or in some instances of some decision problems, it makes the rational choice, and in others- the irrational. All these definitions/ assumptions - do you find them wrong/correct? or do you use different operational definitions?.. I'm sure there are whole theories of rationality and decision out there that you know and I don't... I just want to keep it simple # on 15 March 2020, 13:40 @Vic: I generally assume that rationality involves more than just maximizing expected utility; for example, it also requires adequately responding to evidence. And I would call any agent who isn't rational 'irrational'. But your definitions are also fine with me. It all depends on what we go on to do with them. # on 12 May 2020, 10:20 Some people from the LW community have tried to respond to your objections to the theory. I am tempted to post one of the criticisms here, but I am not allowed to post HTML links, and it is quite long, so I will just ask you to go to the "Open Thread January 2019" thread on Lesswrong and look at user dxu's post, which is quite famous there. It is quite confusing, but the people there seem to hold it in high regard. I will post a few words from it here, but this is hardly a summary. He/she says that your statement "FDT says you should not pay because, if you were the kind of person who doesn't pay, you likely wouldn't have been blackmailed. How is that even relevant? You are being blackmailed." is wrong, and based on a 'naive intuition'. He/she continues by saying "it's not immediately obvious what's wrong with this assumption" and tries to explain what he means. Quoting: "In certain decision problems, your counterfactual behavior matters as much--if not more--than your actual behavior. That is to say, there exists a class of decision problems where the outcome depends on something that never actually happens." "Every single one of those thought experiments could have been written from the perspective, not of the real you, but a hypothetical, counterfactual version of yourself. When "you're" being blackmailed, Schwarz makes the extremely natural assumption that "you" are you. But there's no reason to suppose this is the case. The scenario never stipulates why you're being blackmailed, only that you're being blackmailed. So the person being blackmailed could be either the real you or a hypothetical. And the thing that determines whether it's the real you or a mere hypothetical is... ...your decision whether or not to pay up, of course." # on 12 May 2020, 15:57 Thanks Abir. Here's a link to the comment. This dxu person seems to have misunderstood my post. All I claimed in the post is that (1) FDT is counter-intuitive, (2) I find the arguments in favour of FDT unpersuasive, and (3) the theory was formally underdeveloped and badly presented in the paper I read. Yes, I do think that the circumstances that led to a situation are generally irrelevant to which choice is right. I understand that FDT disagrees, but as dxu admits the assumption is intuitively plausible and I could not find any convincing argument against it either in the paper I read or in this comment. dxu says: "the thing that determines whether it's the real you or a mere hypothetical is your decision whether or not to pay up, of course. If you cave into the blackmail and pay up, then you're almost certainly the real deal. On the other hand, if you refuse to give in, it's very likely that you're simply a counterfactual version of yourself living in an extremely low-probability (if not outright inconsistent) world." I don't think this is a clear way of putting the point. After all, I am sure that I am the real version of myself, and I am sure that there is nothing I can do that would make me merely hypothetical. But I I see what is probably meant. I just don't agree that it is relevant to rational choice. And I don't see any /argument/ in the comment why it should be. # on 15 May 2020, 06:29 Regarding transparent box "Newcomb's paradox" and "blackmail", it may be useful to put a different context around Newcomb's paradox... Fred was asked by Alice to walk her dog for$20, walked her dog, and got paid $20. One year later, Alice tells Fred that she wants him to give her back$9. She can't take $9 from Fred, or do any other harm to Fred. Fred doesn't want to give$9 back. But Alice convinces Fred that the following happened:

Alice read the FDT paper, the fact that FDT one boxes in the transparent Newcomb's Paradox made Alice happy and less irate and she didn't stiff Fred out of half his payment right then (which is what she generally does).

Now what does Fred do if he runs FDT?

Note that the above is Newcomb's Paradox with $11 in the first box and$9 in the second box, but without the greed for $1000 000 or moralization about being punished for excessive greed. Presumably the answer should stay the same, if there is indeed an answer at all. Because$11 is still bigger than $9, and it isn't relevant that Fred walked the dog and has money in his wallet already. An iterated variation is possible, where after being paid$20 Fred decided that Alice was trustworthy and continued to keep walking her dog for $20 a time during that year. I think FDT is just too ill defined and vague, because they never specify how evaluation of "what if this function returns A" is supposed to be combined with facts about the real world. Should Fred hallucinate$11 in his pocket, instead of $20, when he's evaluating the possibility of not giving Alice$9?

# on 15 May 2020, 09:26

@D: I agree with these worries. But I don't quite follow the example. How is Fred's choice analogous to the choice in Newcomb's Problem (with transparent boxes)?

# on 15 May 2020, 11:54

Maybe I see the idea.

On Monday Alice gives Fred $20. On Friday she demands back$9. She convinces Fred of the following. When she gave him the $20 on Monday, she had just read the FDT paper, and considered what FDT would recommend Fred to do on Friday if she gave him$20 on Monday and then asked him to pay back $9 on Friday. She figured out that FDT says that Fred should return the$9. This amused her so much that she was happy to give him the $20; if she had figured out that FDT recommends not returning the$9, she would only have given Fred $10. Now, on Friday, Fred wonders whether to return the$9. If FDT says yes, it is likely that he got $20 from Alice on Monday (in the strange sense of the conditional relevant to FDT). If FDT says no, it is likely that he got$10. So according to FDT, it's better to return the $9. (Which is what Alice anticipated, and what amused her so much.) Nice. Is that roughly what you had in mind? # on 15 May 2020, 13:20 Yes, that works. Although note that for FDT I don't think it matters for what reason Alice's actions were influenced by the FDT paper. E.g. Alice needs not have a pre existing plan to take back$9. I'm not assuming that Alice knows Fred follows FDT.

With Newcomb's paradox, I feel that the issue is it is specifically worded to align all the irrelevancies to make one boxing sound right. It is a large gift, and if you commit a sin of being too greedy, you don't get the large gift. You probably learnt numerous books in your childhood telling you relevant stories.

It is a set up for a logical fallacy.

Hence I'm changing it to $11 and$9 , and making the money already belong to Fred by a contract, and placing it in Fred's wallet. The dog walking bit is completely irrelevant to the logical problem but it is relevant to the fallacy.

I guess it could be argued that Alice isn't fair, but what she does isn't dependent on what decision theory Fred employs, so by any normal definition it is fair. She likes to talk people into giving back the money, and of course that is not good to the people who can be talked into giving back the money.

# on 16 May 2020, 02:34

I also wonder if FDT can combine with inductive inference, given that they claim alternatives override actual physical sensory input in transparent Newcomb's paradox.

The agent has been looking at a raw feed from a high res camera for maybe an hour, until its input contains 1 gigabyte of fundamentally incompressible photon shot noise (overlaid over what it actually saw). Every hypothesis the agent has left, had a prior less than 2^-8E9 (because every hypothesis which didn't predict exactly the right noise got discarded). edit: to be clear, I'm assuming Solomonoff induction and variants, where all remaining hypotheses after receiving sensory input, print a string beginning with that sensory input.

But suppose that there is a hypothesis which describes each bit of sensory input as a result of an FDT calculation. Like the transparent box Newcomb's paradox describes what is seen in the boxes, or the contents of Fred's wallet.

How is that hypothesis is to be penalized as new bits come in? Instead of having to actually "store" new bits (in an increasingly complex hypothesis), it just "decides" what those bits should be, and in the alternative that it decides something that doesn't match sensory input, it's acting as if that was the sensory input.

edit: to be clear the problem is that this hypothesis will eventually become dominant, no matter how low of a prior it may have had.

# on 18 May 2020, 10:34

To be honest, I am still confused about the Transparent Newcomb's Paradox. If the predictor is not completely accurate, but just a 'good predictor', then how can we say whether CDT or FDT agents perform better? Can someone briefly explain me why does Dr. Schwartz say this?

Is one-boxing better when the predictor is just good at predicting, because it means you have a good chance of getting a million, or maybe two-boxing is better because it is better to get either a million and a thousand or just a thousand?

# on 18 May 2020, 17:27

With "FDT" it depends on how it thinks the predictor is making the prediction. It only supposed to diverge from CDT if it believes predictor actually evaluates FDT.

So for FDT, you can't actually specify the predictor as a black box, you have to specify why and how it is being not entirely accurate (while for CDT the black box works).

For what its worth, I personally think if you pop the hood off yourself, and pop the hood off another entity, and examine internals one by one, and find them to coincide such that you can very confidently conclude that the difference between the two is very reliably 0, you should be able to use that fact in your decision making.

Ditto for e.g. popping the hood off a copying machine and determining that it has made an exact replica of you (which you end up playing Prisoner's Dilemma with).

That all should be a part of the hypothesis space though. After all this chain of causal events in your future light cone is an empirical fact you learned. Even if we are born with it, it has evolved and was "learned" by evolution. None of that belongs in a decision theory, even if its tempting to put it there so that the decision theory can better handle some under specified hypothetical.