I wanted to call attention to a relevant and underappreciated paper by John Leslie: "Ensuring Two Bird Deaths with One Throw" (Mind, 1991) <jstor.org/stable/2254984>. If you have a perfect clone... then by killing a bird with a stone, you ensure that your clone does likewise.

Leslie calls this phenomenon "quasi-causation" and applies it to Newcomb's Problem, among other issues.]]>

I have Garson's book (at home), but didn't remember that it mentions this issue.

I'll look up the Greco vs Carter debate. The same point arguably arises for Lewis's account of knowledge in "Elusive Knowledge". It is sometimes claimed (e.g. by Williamson) that the logic of Elusive Knowledge is S5. That would be (almost) correct on an absolutist reading of Lewis's rules, on which the line between ignored and non-ignored worlds depends only on the context of utterance. ("Almost", because the logic would actually be KD45.) But some of Lewis's rules are relativist. For example, the rule of actuality says that the /subject/'s world is never properly ignored. This ensures that the accessibility relation is reflexive, and it breaks symmetry, among other things. For example, if subject 1 is looking at a zebra and subject 2 has the same experiences but is looking at a disguised mule, then we can properly ignore subject 2 when we talk about subject 1, but not when we talk about subject 2. By contrast, when we talk about subject 2, we can never properly ignore subject 1. The subject's beliefs and stakes also matter in Lewis's account.

]]>

http://fitelson.org/piksi/deontic_logic_problems.pdf

There is a bit of discussion of the condition itself in Garson's "Modal Logic for Philosophers" (p.109). (If you are interested I can e-mail you a PDF of a screenshot).

I wish I had a discussion like yours to reference!]]>

In one sense, yes, in another, no. His doughnut eating violates the
laws *of w*, but not the laws of *our world*. Conversely,
his kitten torturing violates a moral code accepted at our world, but
not a code accepted at w.

In general, when we ask whether people at other worlds do what they
ought to do, we can evaluate their actions relative to *their*
norms, or we can evaluate them relative to *our* norms. Both
perspectives make sense. But they lead to different deontic logics.

Let's assume a Kripke semantics for deontic logic. (The issues carry
over to neighbourhood semantics.) Here, `it ought to be the case that
p' is regarded as true at a world w iff p is true at all *deontic
alternatives* to w. A world v is a deontic alternative to w iff
everything that ought to be the case at w is the case at v.

This formal definition leaves open how we should understand 'ought to be the case at w'. Is this a matter of the norms accepted at w, or is it a matter of the actual norms that we hold fixed when we consider w?

On the first approach, a world v is a deontic alternative to w iff v
is permissible by the standards that are accepted or endorsed at w.
Let's call this the *relativist* approach.

On the second, *absolutist* approach, we don't care what norms
are accepted at the relevant evaluation point w (unless our own norms
say that this makes a difference). Here v is a deontic alternative to
w iff v is permissible by our norms.

(You might suggest that there's yet another approach, on which we
evaluate the actions at w relative to the *true* norms, which
don't depend on what anyone at any world endorses or accepts. If the
true norms are constant between worlds, this approach yields the same
logic as the absolutist approach. If the true norms vary from world to
world, we can again distinguish a relativist and an absolutist
version, whose logic will look like the logic of the two approaches I
just introduced.)

What difference does this make to deontic logic?

Consider the "Utopia" principle, that it ought to be the case that
whatever ought to be the case is the case: O(Op → p). In Kripke
semantics, this corresponds to the hypothesis that any deontic
alternative to any world is a deontic alternative to itself. On the
relativist approach, this is highly implausible. The fact that a world
w conforms to *our* norms surely doesn't entail that it conforms
to the *norms at w*. On the absolutist approach, however, the
principle looks highly plausible. If w conforms to our norms, and we
use those same norms to evaluate what ought to be the case at w, then
plausibly everything that ought to be the case at w is the case at w.

One might argue that on the absolutist approach, the very same worlds are deontic alternatives to any world. The logic of obligation and permission would then be KD45.

On the relativist approach, even the minimal logic K is arguably too strong. For example, consider a world at which there is nothing but empty spacetime, and so no norms are accepted or enforced at all. At such a world, arguably nothing is obligatory and nothing is permitted, on the relativist interpretation. But O(T) is a theorem of K. We even have to give up the duality of obligation and permission, since at the empty world we have neither O(p) nor P(~p).

Since the question of absolutism vs relativism makes such a big difference to deontic logic, you'd expect it to be discussed upfront in introductory texts on deontic logic. I have looked at a good handful of such texts, but I've never seen the question even mentioned. Instead, people seem to take one or the other approach for granted, and you have to read closely between the lines to figure out what approach that is.

For example, Brian Chellas's *Modal Logic* textbook, which
extensively discusses deontic logic, clearly adopts the relativist
interpretation. (Which partly explains why Chellas is so critical of
Standard Deontic Logic.) The Anderson/Kanger reduction, discussed for
example in the "SEP entry,
clearly assumes absolutism. Lewis also seems to assume absolutism in
his writings on deontic logic. But as I said, there is little explicit
discussion, and people often seem to talk past each other.

I see that on a population-level statistical average, purely selfish FDT agents often do better than purely selfish CDT agents. I said as much in the post, so I don't think we disagree here. Except that I don't think average population-level success among selfish agents is an adequate test for the right decision theory. A somewhat more adequate test, I think, is to look at which theory gives better results across a wide range of decision problems, no matter how these problems came about. On that measure, selfish CDT agents generally do better than selfish FDT agents. But of course I can't prove to you that my test is more adequate.

]]>

Or maybe we’re using the versions of the problems where the blackmailer is not entirely predictable and might still blackmail the functional decision theorist (but be more likely to blackmail the causal decision theorist), or where the Newcomb predictor is not a perfect predictor but only very likely to predict correctly, or where the other prisoner twin might be hit by a cosmic ray with low probability and not make the same decision as you. If so, situations where CDT does better than FDT are less likely than situations where FDT does better, so FDT still comes out ahead.

Let’s assume that we’re using the deterministic version of each of these problems, rather than the probabilistic version: the blackmailer is guaranteed to know what decision theory you use and to act accordingly, the Newcomb predictor is guaranteed to predict correctly, your twin is guaranteed to make the same prediction as you, your father is guaranteed to procreate if and only if you do.

Now let’s consider the blackmail problem. The post says, “If you face the choice between submitting to blackmail and refusing to submit (in the kind of case we’ve discussed), you fare dramatically better if you follow CDT than if you follow FDT.” This is true. The problem is that, if you are being blackmailed, this means that you are not going to follow FDT. If you were going to follow FDT, the blackmailer would not have blackmailed you. The fact that you have been blackmailed means you can be 100% certain that you will not follow FDT. In itself, being 100% certain that you will not follow FDT does not prevent you from following FDT. But it does make the situation where you follow FDT and come worse off impossible, which is relevant to our determination of which decision theory is better.

Let’s consider the Newcomb problem. If the Newcomb predictor is guaranteed to predict your choice correctly, it is impossible for an agent using CDT to see a million in the right-hand box.

It never does any good to dismiss a logical inconsistency and to consider what happens anyway.

What happens if we ignore this and suppose that the CDT agent does see a thousand in the left-hand box and a million in the right-hand box? Then using this supposition we can prove that they will get both amounts if they two-box. But since they are a CDT agent, we know that they will two-box, therefore there is nothing in the right-hand box, so we can prove that they will only get a thousand if they two-box. But suppose that they one-box instead. Since they are a CDT agent, we know that they will two-box, so we know that there is nothing in the right-hand box, so we can prove that if they one-box they will get nothing. However, we know that they see a million in the right-hand box, so we can prove that if they one-box, they will get a million. So we can prove that they should one-box, and we can prove that they should two-box. At this point we can conclude that a million and nothing are the same thing, and that a thousand is equal to a million plus a thousand. Avec des si, on mettrait Paris en bouteille.

The procreation example is harder to prove inconsistent because it relies on infinite regress.

Here’s a first way to resolve it. Should I procreate? If I do, my life will be miserable. But my father followed the same decision theory I do, so if I choose not to procreate, that means my father will have chosen not to procreate. So I will not exist. So I can prove that, if I end up choosing not to procreate, that means I do not exist. However, I do exist. That’s a contradiction. I guess that means I will not choose not to procreate. Knowing that I will not make that choice does not in itself prevent me from making the choice though. Should I choose not to procreate anyway? Well, I can prove that if I do not procreate, then I will not exist, and that if I do, then my life will be miserable. A miserable life is better than not existing, so I should procreate. However, I know that I exist, and that is the consequent of the implication “if I do not procreate, then I [will] exist”, so the implication is true, whereas if I choose to procreate I still exist but my life is miserable. A miserable life is worse than a non-miserable life, so I should not procreate. Oops, I can prove that I should procreate and that I should not procreate? That’s a contradiction, and this one doesn’t rely on the supposition that I made any particular choice. The world I am living in must be inconsistent.

We can also solve it by directly addressing the infinite regress.

Should I procreate? If I do, my life will be miserable. But my father followed the same thought process I did, would have made the same decisions, so if I choose not to procreate, that means my father will have chosen not to procreate. Then I would not exist, and a miserable life is better than not existing, so I should procreate.

Why did my father procreate, though, if that made his life miserable?

Oh, right. My grandfather followed the same thought process that my father did, so if he chose not to procreate, that means his father would have chosen not to procreate, and so he would not exist either. Since he too considered a miserable life better than not existing, he chose to procreate.

Why did my grandfather procreate, though, if that made his life miserable? What about my great-grandfather? What about—

The recursive buck stops *here*.

My {The Recursive Buck Stops Here}-great-. . .-great-grandfather did not choose to procreate because that would have made his life miserable. Therefore I do not exist. That’s a contradiction. The assumption that each generation of ancestry uses FDT and only exists if the previous chose to procreate is inconsistent with the assumption that any of them exist. No FDT agent can ever face this problem, and no designer can ever have to pick a decision theory for an agent that could have to face this problem. And if we only assume that it is unlikely that the father made a different decision from you, and not that it is certain that he did not, then FDT makes it less likely that you will not exist, and so it again comes out ahead of CDT.

There is one category of situations (the one exception I mentioned) where FDT can leave you worse off than CDT, and that is what happens when “someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options”. FDT can change your decisions to make them optimal, but it can’t change the initial decision theory you used to make the decisions. It can only pick decisions identical to those of another decision theory. That doesn’t prevent an environment from knowing what your initial decision theory was and punishing you on that basis. This is unsolvable by any decision theory. Therefore it can hardly be taken as a point against FDT.

I said that it never does any good to dismiss a logical inconsistency. I want to clarify that this is not the same as saying that we should dismiss thought experiments because their premises are unlikely. “Extremism In Thought Experiment Is No Vice”. Appealing to our intuitions about extreme cases is informative. But logical impossibility is informative too, and is what we care about when comparing decision theories. Nate Soares has claimed “that *all* decision-making power comes from the ability to induce contradictions: the whole reason to write an algorithm that loops over actions, constructs models of outcomes that would follow from those actions, and outputs the action corresponding to the highest-ranked outcome is so that it is contradictory for the algorithm to output a suboptimal action.”]]>

I actually don't know Nate Soares, but Eliezer Yudkowsky is a
celebrity in the "rationalist" community. Many of his posts on the "Less Wrong blog are
gems. I also enjoyed his latest book, "*Inadequate Equilibria*.
Yudkowsky seems to be interested in almost everything, but he regards
decision theory as his main area of research. I also work in decision
theory, but I've always struggled with Yudkowsky's writings on this
topic.

Before I explain what I found wrong with the paper, let me review the main idea and motivation behind the theory it defends.

Standard lore in decision theory is that there are situations in which it would be better to be irrational. Three examples.

Blackmail.Donald has committed an indiscretion. Stormy has found out and considers blackmailing Donald. If Donald refuses and blows Stormy's gaff, she is revealed as a blackmailer and his indiscretion becomes public; both suffer. It is better for Donald to pay hush money to Stormy. Knowing this, it is in Stormy's interest to blackmail Donald. If Donald were irrational, he would blow Stormy's gaff even though that would hurt him more than paying the hush money; knowing this, Stormy would not blackmail Donald. So Donald would be better off if here were (known to be) irrational.

Prisoner's Dilemma with a Twin.Twinky and her clone have been arrested. If they both confess, each gets a 5 years prison sentence. If both remain silent, they can't be convicted and only get a 1 year sentence for obstructing justice. If one confesses and the other remains silent, the one who confesses is set free and the other gets a 10 year sentence. Neither cares about what happens to the other. Here, confessing is the dominant act and the unique Nash equilibrium. So if Twinky and her clone are rational, they'll each spend 5 years in prison. If they were irrational and remained silent, they would get away with 1 year.

Newcomb's Problem with Transparent Boxes.A demon invites people to an experiment. Participants are placed in front of two transparent boxes. The box on the left contains a thousand dollars. The box on the right contains either a million or nothing. The participants can choose between taking both boxes (two-boxing) and taking just the box on the right (one-boxing). If the demon has predicted that a participant one-boxes, she put a million dollars into the box on the right. If she has predicted that a participant two-boxes, she put nothing into the box. The demon is very good at predicting, and the participants know this. Each participant is only interested in getting as much money as possible. Here, the rational choice is to take both boxes, because you are then guaranteed to get $1000 more than if you one-box. But almost all of those who irrationally take just one box end up with a million dollars, while most of those who rationally take both boxes leave with $1000.

The driving intuition behind Yudkowsky and Soares's paper is that
decision theorists have been wrong about these (and other) cases: in
each case, the supposedly irrational choice is actually rational.
Whether a pattern of behaviour is rational, they argue, should be
measured by how good it is for the agent. In *Newcomb's Problem with
Transparent Boxes*, one-boxers fare better than two-boxers. So we
should regard one-boxing as rational. Similarly for the other
examples. Standard decision theories therefore get these cases wrong.
We need a new theory.

Functional Decision Theory (FDT) is meant to be that theory. FDT
recommends blowing the gaff in *Blackmail*, remaining silent in
*Prisoner's Dilemma with a Twin*, and one-boxing in *Newcomb's
Problem with Transparent Boxes*.

Here's how FDT works, and how it differs from the most popular form
of decision theory, Causal Decision Theory (CDT). Suppose an agent
faces a choice between two options A and B. According CDT, the agent
should evaluate these options in terms of their possible consequences
(broadly understood). That is, the agent should consider what might
happen if she were to choose A or B, and weigh the possible outcomes
by their probability. In FDT, the agent should not consider what would
happen if she were to choose A or B. Instead, she ought to consider
what would happen if *the right choice according to FDT were A or
B*.

Take *Newcomb's Problem with Transparent Boxes*. Without loss
of generality, suppose you see $1000 in the left box and a million in
the right box. If you were to take both boxes, you would get a million
and a thousand. If you were to take just the right box, you would get
a million. So Causal Decision Theory says you should take box boxes.
But let's suppose you follow FDT, and you are certain that you do. You
should then consider what would be the case if FDT recommended
one-boxing or two-boxing. These hypotheses are not hypotheses just
about your present choice. If FDT recommended two-boxing, then any FDT
agent throughout history would two-box. And, crucially, the demon
would (probably) have foreseen that you would two-box, so she would
have put nothing into the box on the right. As a result, if FDT
recommended two-boxing, you would probably end up with $1000. To be
sure, you know that there's a million in the box on the right. You can
see it. But according to FDT, this is irrelevant. What matters is what
*would* be in the box relative to different assumptions about
what FDT recommends.

To spell out the details, one would now need to specify how to compute the probability of various outcomes under the subjunctive supposition that FDT recommended a certain action. Yudkowsky and Soares are explicit that the supposition is to be understood as counterpossible: we need to suppose that a certain mathematical function, which in fact outputs A for input X, instead were to output B. They do not explain how to compute the probability of outcomes under such a counterpossible supposition. So we don't get any details spelled out. This is flagged as the main open question for FDT.

It is not obvious to me why Yudkowsky and Soares choose to model
the relevant supposition as a mathematical falsehood. For example, why
not let the supposition be: *I am the kind of agent who chooses A in
the present decision problem*? That is an ordinary contingent
(centred) propositions, since there are possible agents who do choose
option A in the relevant problem. These agents may not
follow FDT, but I don't see why that would matter. For some reason, Yudkowsky and
Soares assume that an FDT agent is certain that she follows FDT, and
this knowledge is held fixed under all counterfactual suppositions. I
guess there is a reason for this assumption, but they don't tell
us.

Anyway. That's the theory. What's not to like about it?

For a start, I'd say the theory gives insane recommendations in
cases like *Blackmail*, *Prisoner's Dilemma with a Twin*,
and *Newcomb's Problem with Transparent Boxes*. Take
*Blackmail*. Suppose you have committed an indiscretion that
would ruin you if it should become public. You can escape the ruin by
paying $1 once to a blackmailer. Of course you should pay! FDT says
you should not pay because, if you were the kind of person who doesn't
pay, you likely wouldn't have been blackmailed. How is that even
relevant? You *are* being blackmailed. Not being blackmailed
isn't on the table. It's not something you can choose.

Admittedly, that's not much of an objection. I say you'd be insane not to pay the $1, Yudkowsky and Soares say you'd be irrational to pay. Neither of us can prove that their judgement is right from neutral premises.

What about the fact that FDT agents do better than (say) CDT agents? I admit that if this were a fact, it would be somewhat interesting. But it's not clear if it is true.

First, it depends on how success is measured. If you face the
choice between submitting to blackmail and refusing to submit (in the
kind of case we've discussed), you fare dramatically better if you
follow CDT than if you follow FDT. If you are in *Newcomb's Problem
with Transparent Boxes* and see a million in the right-hand box,
you again fare better if you follow CDT. Likewise if you see nothing
in the right-hand box.

So there's an obvious sense in which CDT agents fare better than FDT agents in the cases we've considered. But there's also a sense in which FDT agents fare better. Here we don't just compare the utilities scored in particular decision problems, but also the fact that FDT agents might face other kinds of decision problems than CDT agents. For example, FDT agents who are known as FDT agents have a lower chance of getting blackmailed and thus of facing a choice between submitting and not submitting. I agree that it makes sense to take these effects into account, at least as long as they are consequences of the agent's own decision-making dispositions. In effect, we would then ask what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime. Even then, however, there is no guarantee that FDT would come out better. What if someone is set to punish agents who use FDT, giving them choices between bad and worse options, while CDTers are given great options? In such an environment, the engineer would be wise not build an FDT agent.

Moreover, FDT does not in fact consider only consequences of the
agent's own dispositions. The supposition that is used to evaluate
acts is that FDT *in general* recommends that act, not just that
the agent herself is disposed to choose the act. This leads to even
stranger results.

Procreation.I wonder whether to procreate. I know for sure that doing so would make my life miserable. But I also have reason to believe that my father faced the exact same choice, and that he followed FDT. If FDT were to recommend not procreating, there's a significant probability that I wouldn't exist. I highly value existing (even miserably existing). So it would be better if FDT were to recommend procreating. So FDT says I should procreate. (Note that this (incrementally) confirms the hypothesis that my father used FDT in the same choice situation, for I know that he reached the decision to procreate.)

In *Procreation*, FDT agents have a much worse life than CDT
agents.

All that said, I agree that there's an apparent advantage of the
"irrational" choice in cases like *Blackmail* or *Prisoner's
Dilemma with a Twin*, and that this raises an important issue. The
examples are artificial, but structurally similar cases arguably come
up a lot, and they have come up a lot in our evolutionary history.
Shouldn't evolution have favoured the "irrational" choices?

Not necessarily. There is another way to design agents who refuse
to submit to blackmail and who cooperate in Prisoner Dilemmas. The trick is
to tweak the agents' utility function. If Twinky cares about her
clone's prison sentence as much as about her own, remaining silent
becomes the dominant option in *Prisoner's Dilemma with a Twin*.
If Donald develops a strong sense of pride and would rather take
Stormy down with him than submit to her blackmail, refusing to pay
becomes the rational choice in *Blackmail*.

FDT agents rarely find themselves in *Blackmail* scenarios.
Neither do CDT agents with a vengeful streak. If I wanted to design a
successful agent for a world like ours, I would build a CDT agent who
cares what happens to others. My CDT agent would still two-box in
*Newcomb's Problem with Transparent Boxes* (or in the original
Newcomb Problem). But this kind of situation practically never arises
in worlds like ours.

The story I'm hinting at has been well told by others. I'd
especially recommend Brian Skyrms's *Evolution of the Social
Contract* and chapter 6 of Simon Blackburn's *Ruling
Passions*.

So here's the upshot. Whether FDT agents fare better than CDT agents depends on the environment, on how "faring better" is measured, and on what the agents care about. Across their lifetime, purely selfish agents might do better, in a world like ours, if they followed FDT. But that doesn't persuade me that the insane recommendations FDT are correct.

So far, I have explained why I'm not convinced by the case for FDT. I haven't explained why I didn't recommend the paper for publication. That I'm not convinced is not a reason. I'm rarely convinced by arguments I read in published papers.

The standards for deserving publication in academic philosophy are relatively simple and self-explanatory. A paper should make a significant point, it should be clearly written, it should correctly position itself in the existing literature, and it should support its main claims by coherent arguments. The paper I read sadly fell short on all these points, except the first. (It does make a significant point.)

Here, then, are some of the complaints from my referee report, lightly edited for ease of exposition. I've omitted several other complaints concerning more specific passages or notation from the paper.

A popular formulation of CDT assumes that to evaluate an option A we should consider the probability of various outcomes

*on the subjunctive supposition*that A were chosen. That is, we should ask how probable such-and-such an outcome would be if option A were chosen. The expected utility of the option is then defined as the probability-weighted average of the utility of these outcomes. In much of their paper, Yudkowsky and Soares appear to suggest that this is exactly how expected utility is defined in FDT. The disagreement between CDT and FDT would then boil down to a disagreement about what is likely to be the case under the subjunctive supposition that an option is chosen.For example, consider

*Newcomb's Problem with Transparent Boxes*. Suppose (without loss of generality) that the right-hand box is empty. CDT says you should take both boxes because*if you were to take only the right-hand box you would get nothing*whereas*if you were to take both boxes, you would get $1000*. According to FDT (as I presented it above, and as it is presented in*parts*of the paper), we should ask a different question. We should ask would be the case*if FDT recommended one-boxing*, and what would be the case*if FDT recommended two-boxing*. For much of the paper, however, Yudkowsky and Soares seem to assume that these questions coincide. That is, they suggest that you should one-box because*if you were to one-box, you would get a million*. The claim that you would get nothing if you were to one-box is said to be a reflection of CDT.If that's really what Yudkowsky and Soares want to say, they should, first, clarify that FDT is a

*special case*of CDT as conceived for example by Stalnaker, Gibbard & Harper, Sobel, and Joyce, rather than an alternative. All these parties would agree that the expected utility of an act is a matter of what would be the case if the act were chosen. (Yudkowsky and Soares might then also point out that "Causal Decision Theory" is not a good label, given that they don't think the relevant conditionals track causal dependence. John Collins has made "essentially the same point.)Second, and more importantly, I would like to see some arguments for the crucial claim about subjunctive conditionals. Return once more to the Newcomb case. Here's the right-hand box. It's empty. It's a normal box. Nothing you can do has any effect on what's in the box. The demon has tried to predict what you will do, but she could be wrong. (She has been wrong before.) Now, what would happen if you were to take that box, without taking the other one? The natural answer, by the normal rules of English, is:

*you would get an empty box*. Yudkowsky and Soares instead maintain that the correct answer is:*you would find a million in the box*. Note that this is a claim about the truth-conditions of a certain sentence in English, so facts about the long-run performance of agents in decision problems don't seem relevant. (If the predictor is highly reliable, I think a "backtracking" reading can become available on which it's true that you would get a million, "as Terry Horgan has pointed out. But there's still the other reading, and it's much more salient if the predictor is less reliable.)Third, later in the paper it transpires that FDT can't possibly be understood as a special case of CDT along the lines just suggested because in some cases FDT requires assessing the expected utility of an act by looking exclusively at scenarios in which that act is not performed. For example, in

*Blackmail*, not succumbing is supposed to be better because it decreases the chance of being blackmailed. But any conditional of the form*if the agent were to do A, then the agent would do A*is trivially true in English.Fourth, in other parts of the paper it is made clear that FDT does not instruct agents to suppose that a certain act were performed, but rather to suppose that FDT always were to give a certain output for a certain input.

I would recommend dropping all claims about subjunctive conditionals involving the relevant acts. The proposal should be that the expected utility of act A in decision problem P is to be evaluated by subjunctively supposing not A, but the proposition that FDT outputs A in problem P. (That's how I presented the theory above.) The proposal then wouldn't rely on implausible and unsubstantiated claims about English conditionals.

[I then listed several passages that would need to be changed if the suggestion is adopted.]

I'm worried that so little is said about how subjunctive probabilities are supposed to be revised when supposing that FDT gives a certain output for a certain decision problem. Yudkowsky and Soares insist that this is a matter of subjunctively supposing a proposition that's mathematically impossible. But as far as I know, we have no good models for supposing impossible propositions.

Here are three more specific worries.

First, mathematicians are familiar with reductio arguments, which appear to involve impossible suppositions. "Suppose there were a largest prime. Then there would be a product x of all these primes. And then x+1 would be prime. And so there would be a prime greater than all primes." What's noteworthy about these arguments is that whenever B is mathematically derivable from A, then mathematicians are prepared to accept 'if A were the case then B would be the case', even if B is an explicit contradiction. (In fact, that's where the proof usually ends: "If A were the case then a contradiction would be the case; so A is not the case.")

If that is how subjunctive supposition works, FDT is doomed. For if A is a mathematically false proposition, then anything whatsoever mathematically follows from A. (I'm ignoring the subtle difference between mathematical truth and provability, which won't help.) So then anything whatsoever would be the case on a counterpossible supposition that FDT produces a certain output for a certain decision problem. We would get:

*If FDT recommended two-boxing in Newcomb's Problem, then the second box would be empty*, but also*If FDT recommended two-boxing in Newcomb's Problem, then the second box would contain a million*, and*If FDT recommended two-boxing in Newcomb's Problem, the second box would contain a round square*.A second worry. Is a probability function revised by a counterpossible supposition, as employed by FDT, still a probability function? Arguably not. For presumably the revised function is still certain of elementary mathematical facts such as the Peano axioms. (If, when evaluating a relevant scenario, the agent is no longer sure whether 0=1, all bets are off.) But some such elementary facts will logically entail the negation of the supposed hypothesis. So in the revised probability function, probability 1 is not preserved under logical entailment; and then the revised function is no longer a classical probability function. (This matters, for example, because Yudkowsky and Soares claim that the representation theorem from Joyce's

*Foundations of Causal Decision Theory*can be adapted to FDT, but Joyce's theorem assumes that the supposition preserves probabilistic coherence.)Another worry. Subjunctive supposition is relatively well-understood for propositions about specific events at specific times. But the hypothesis that FDT yields a certain output for a certain input is explicitly not spatially and temporally limited in this way. We have no good models for how supposing such general propositions works, even for possible propositions.

The details matter. For example, assume FDT actually outputs B for problem P, and B' for a different problem P'. Under the counterpossible supposition that FDT outputs A for P, can we hold fixed that it outputs B' for P'? If not, FDT will sometimes recommend choosing a particular act because of the advantages of choosing a

*different*act in a*different*kind of decision problem.Standard decision theories are not just based on brute intuitions about particular cases, as Yudkowsky and Soares would have us believe, but also on general arguments. The most famous of these are so-called representation theorems which show that the norm of maximising expected utility can be derived from more basic constraints on rational preference (possibly together with basic constraints on rational belief). It would be nice to see which of the preference norms of CDT Yudkowsky and Soares reject. It would also be nice if they could offer a representation theorem for FDT. All that is optional and wouldn't matter too much, in my view, except that Yudkowsky and Soares claim (as I mentioned above) that the representation theorem in Joyce's

*Foundations of Causal Decision Theory*can be adapted straightforwardly to FDT. But I doubt that it can. The claim seems to rest on the idea that FDT can be formalised just like CDT, assuming that subjunctively supposing A is equivalent to supposing that FDT recommends A. But as I've argued above, the latter supposition arguably makes an agent's subjective probability function incoherent. More obviously, in cases like*Blackmail*, A is plausibly false on the supposition that FDT recommends A. These two aspects already contradict the very first two points in the statement of Joyce's representation theorem, on p.229 of*The Foundations of Causal Decision Theory*, under 7.1.a.Yudkowsky and Soares constantly talk about how FDT "outperforms" CDT, how FDT agents "achieve more utility", how they "win", etc. As we saw above, it is not at all obvious that this is true. It depends, in part, on how performance is measured. At one place, Yudkowsky and Soares are more specific. Here they say that "in all dilemmas where the agent's beliefs are accurate [??] and the outcome depends only on the agent's actual and counterfactual behavior in the dilemma at hand -- reasonable constraints on what we should consider "fair" dilemmas -- FDT performs at least as well as CDT and EDT (and often better)". OK. But how we should we understand "depends on ... the dilemma at hand"? First, are we talking about subjunctive or evidential dependence? If we're talking about evidential dependence, EDT will often outperform FDT. And EDTers will say that's the right standard. CDTers will agree with FDTers that subjunctive dependence is relevant, but they'll insist that the standard Newcomb Problem isn't "fair" because here the outcome (of both one-boxing and two-boxing) depends not only on the agent's behavior in the present dilemma, but also on what's in the opaque box, which is entirely outside her control. Similarly for all the other cases where FDT supposedly outperforms CDT. Now, I can vaguely see a reading of "depends on ... the dilemma at hand" on which FDT agents really do achieve higher long-run utility than CDT/EDT agents in many "fair" problems (although not in all). But this is a very special and peculiar reading, tailored to FDT. We don't have any independent, non-question-begging criterion by which FDT always "outperforms" EDT and CDT across "fair" decision problems.

FDT closely resembles Justin Fisher's ""Disposition-Based Decision Theory" and the proposal in David Gauthier's

*Morals by Agreement*, both of which are motivated by cases like*Blackmail*and*Prisoner's Dilemma with a Twin*. Neither is mentioned. It would be good to explain how FDT relates to these earlier proposals.The paper goes to great lengths criticising the rivals CDT and EDT. The apparent aim is to establish that both CDT and EDT sometimes make recommendations that are clearly wrong. Unfortunately, these criticisms are largely unoriginal, superficial, or mistaken.

For example, Yudkowsky and Soares fault EDT for giving the wrong verdicts in simple medical Newcomb problems. But defenders of EDT such as Arif Ahmed and Huw Price have convincingly argued that the relevant decision problems would have to be highly unusual. Similarly, Yudkowsky and Soares cite a number of well-known cases in which CDT supposedly gives the wrong verdict, such as Arif's "Dicing with Death. But again, most CDTers would not agree that CDT gets these cases wrong. (See "this blog post for my response to

*Dicing with Death*.) In general, I am not aware of any case in which I'd agree that CDT -- properly spelled out -- gives a problematic verdict. Likewise, I suspect Arif does not think there are any cases in which EDT goes wrong. It just isn't true that both CDT and EDT are commonly agreed to be faulty. If Yudkowsky and Soares want to argue that they are, they need to do more than revisit well-known scenarios and make bold assertions about what CDT and EDT say about them.The criticism of CDT and EDT also contains several mistakes. For example, Yudkowsky and Soares repeatedly claim that if an EDT agent is certain that she will perform an act A, then EDT says she must perform A. I don't understand why. I guess the idea is that (1) if P(B)=0, then the evidential expected utility of B is undefined, and (2) any number is greater than undefined. But lots of people, from Kolmogoroff to "Hajek, have argued against (1), and I don't know why anyone would find (2) plausible.

For another example, Yudkowsky and Soares claim that CDT (like FDT) involves evaluating logically impossible scenarios. For example, "[CDTers] are asking us to imagine the agent's physical action changing while holding fixed the behavior of the agent's decision function". Who says that? I would have thought that when we consider what would happen if you took one box in Newcomb's Problem, the scenario we're considering is one in which your decision function outputs one-boxing. We're not considering an impossible scenario in which your decision function outputs two-boxing, you have complete control over your behaviour, and yet you choose to one-box. There are many detailed formulations of CDT. Yudkowsky and Soares ignore almost all of them and only mention the comparatively sketchy theory of Pearl. But even Pearl's theory plausibly doesn't appeal to impossible propositions to evaluate ordinary options. Lewis's or Joyce's or Skyrms's certainly doesn't.

I still think the paper could probably have been published after a few rounds of major revisions. But I also understand that the editors decided to reject it. Highly ranked philosophy journals have acceptance rates of under 5%. So almost everything gets rejected. This one got rejected not because Yudkowsky and Soares are outsiders or because the paper fails to conform to obscure standards of academic philosophy, but mainly because the presentation is not nearly as clear and accurate as it could be.

]]>What do we use if we want to say that something is compatible with someone's beliefs? Suppose at some worlds compatible with Betty's belief state, it is currently snowing. We could express this by "Betty does not believe that it is not snowing". But (for some reason) that's really hard to parse.

Arguably, the most natural choice is: "Betty believes that it might
be snowing". Here, the possibility modal 'might' is embedded under the
necessity modal 'believes'. Clearly the embedded 'might' is relative
to Betty's belief state: "Betty believes that it might be snowing"
does not state that Betty believes that for all *we* know, it
might be snowing. So 'might', in effect, serves as the dual of
'believes', but it has to be embedded under 'believes' because we need
a transitive verb to indicate the person whose beliefs are compatible
with the relevant proposition.

But why does "believes that might" express the dual of belief, rather than a higher-order belief about belief? Because the logic of belief is arguably KD45, and in KD45, □◇p is equivalent to ◇p.

In fact, this is a nice argument in favour of assuming that the logic of belief is (at least) KD45: the assumption explains why "believes that might" is commonly used to express the dual of belief, and why there's no need to introduce a separate verb for the dual.

What about knowledge? There is also no dual for 'knows' in English. But here the situation is different.

First, unlike "Betty does not believe that it is not snowing", "Betty does not know that it is not snowing" is not too hard to understand.

Second, the logic of knowledge is plausibly weaker than KD45, so "knows that might" is plausibly not equivalent to "not knows not". Indeed, "Betty knows that it might be snowing" does suggest that Betty has higher-order knowledge concerning the possibility of snow, rather than simply a first-order knowledge state that is compatible with snow.

So why don't we have a dual for 'knows'? The reason, I suspect, is that absense of knowledge is less unified than absence of belief. There are different reasons why someone might fail to know not-p, and it's useful to have different expressions for the different cases.

One reason why Betty might fail to know that it is not currently snowing is that it is in fact snowing. If it snowing, then Betty can't know that it is not snowing, because knowledge entails truth. But in such a case, the norms of pragmatics imply that instead of '~K~p' we should simply say 'p': it is shorter and more informative.

Another reason why Betty might fail to know that it is not currently snowing is that she fails to believe that it isn't snowing. If knowledge entails belief, then lack of belief entails lack of knowledge. So it might be more informative to use the dual of belief ('believes that might') rather than the dual of knowledge, especially if we also don't know whether it is snowing.

Third, if we don't know whether it is snowing, and we know that
Betty doesn't know either, then it is usually better to say that Betty
doesn't know *whether* it is snowing, rather than that she
doesn't know that it is not snowing. Again, it's more informative, and
not more complicated.

These don't cover all possibilities. Sometimes we may know that it is not snowing, and we want to communicate that Betty is not aware of this fact. In that case, we seem to fall back on 'not knows not': "Betty doesn't know that it is not snowing".

In sum, here's my conjecture:

1. We don't have a designated dual of 'believes' because we already have 'believes that might', which serves the same purpose.

2. We don't have a designated dual of 'knows' because there are usually more informative things to say, and we have the means to say these more informative things.

]]>Sly Pete and Mr. Stone are playing poker on a Mississippi riverboat. It is now up to Pete to call or fold. My henchman Zack sees Stone's hand, which is quite good, and signals its content to Pete. My henchman Jack sees both hands, and sees that Pete's hand is rather low, so that Stone's is the winning hand. At this point, the room is cleared. A few minutes later, Zack slips me a note which says "If Pete called, he won," and Jack slips me a note which says "If Pete called, he lost." I know that these notes both come from my trusted henchmen, but do not know which of them sent which note. I conclude that Pete folded.

One puzzle raised by this scenario is that it seems perfectly appropriate for Zack and Jack to assert the relevant conditionals, and neither Zack nor Jack has any false information. So it seems that the conditionals should both be true. But then we'd have to deny that 'if p then q' and 'if p then not-q' are contrary.

Frank Jackson (in conversation) pointed out that Gibbard's passage raises another puzzle that is commonly overlooked. That puzzle is about confirmation.

Let C→W be the conditional 'if Pete called, he won'.

Let E1 be Zack's information -- more specifically, the information that Pete knows Mr. Stone's hand.

Let E2 be Jack's information -- specifically, that Mr. Stone has the better hand.

Intuitively,

(1) E1 strongly supports C→W.

(2) E2 strongly supports ~(C→W).

(3) E1 doesn't strongly support ~E2.

(4) E2 doesn't strongly support ~E1.

But if we read "strongly support" as "making highly probable" then these four assumptions are probabilistically inconsistent. (The proof is left as an exercise.)

You might question (3) or (4). Here's a simpler example where (3) and (4) are not in doubt.

We toss two independent, fair coins. There are four possible outcomes: { H1,T1 } x { H2,T2 }.

Let Same be the proposition (H1 & H2) v (T1 & T2).

Let E1 be Same.

Let E2 be T2.

Let H1→Same be the conditional 'if H1 then Same'.

Intuitively,

(1) E1 strongly supports H1→Same: P(H1→Same/E1) > 0.8 (say).

(2) E2 strongly supports ~(H1→Same): P(~(H1→Same)/E1) > 0.8.

But the following is easily provable:

(3) E1 doesn't strongly support ~E2: P(E2/E1) = 1/2.

(4) E2 doesn't strongly support ~E1: P(E1/E2) = 1/2.

(1)-(4) are probabilistically inconsistent. So (1) and (2) can't be true: either E1 doesn't make H1→Same highly probable or E2 doesn't make ~(H1→Same) highly probable (or both).

The lesson is that our intuitions about whether some piece of evidence supports a given conditional cannot be trusted.

The usual contextualist responses to Gibbard's puzzle seem to be of no help here. The only way to block the lesson would be to give up probabilistic measures of evidential support. But even then we retain the lesson that we can't trust intuitions about whether some evidence renders some conditional probable.

The lesson generalizes. If we can't trust these intuitions, then we
also can't trust intuitions about the probability of a conditional in
a given hypothetical scenario -- for that just *is* an intuition
about the extent to which the assumptions of the scenario makes the
conditional probable. And then we plausibly also can't trust outright
intuitions about the probability of a conditional, since that's the
probability of the conditional given our total evidence.

The lesson is more or less the same as the lesson taught by Lewisian triviality results. But the Gibbard-Jackson route is different from Lewis's route. In particular, we have never assumed that the intuitive probability of a conditional is the corresponding conditional probability.

That said, there is also a way of turning the Gibbard-Jackson argument into an argument against "Stalnaker's Thesis", that for any rational credence function P, P(A→B) = P(B/A). Here is how.

Return to the coin toss scenario. It is easy to see that

(5) P(Same/H1) = 1/2,

(6) P(Same/H1 & Same) = 1

(7) P(Same/H1 & T2) = 0

By Stalnaker's Thesis, it follows that

(8) P(H1→Same / Same) = 1 and

(9) P(H1→Same / T2) = 0,

since P(*/Same) and P(*/T2) are rational credence functions.

(8) and (9) are stronger versions of (1) and (2), and we know that these can't be true. So Stalnaker's Thesis is also false.

]]>So the 'ought' of objective consequentialism evaluates acts "causally", rather than "evidentially". This provides some (intuitive) motivation for using a causal evaluation for the decision-theoretic 'ought' as well. Can we strengthen this observation? How bad would it be to combine objective consequentialism with evidential decision theory?

Here's one attempt to bring out a tension. Imagine an agent whose personal utility function orders possible states of the world in just the way some form of objective consequentialism does, giving highest utility to the "best" states and lowest to the "worst" ones. Suppose also the agent has perfect information about which state would result from each of the options presently available to her. Intuitively, what this agent ought to do in light of her beliefs and desires is precisely what she ought to do according to objective consequentialism. That is, the subjective 'ought' of decision theory and the objective 'ought' of objective consequentialism should here coincide.

In fact, however, the two oughts plausibly do coincide even in evidential decision theory. That's because, as Lewis pointed out in "Causal Decision Theory", conditional on any particular dependency hypothesis (about what the available options would bring about), evidential expected utility and causal expected utility are plausibly equivalent.

So we need a different case to bring out the tension. Here's such a case, inspired by "Jack Spencer and Ian Wells.

Consider a Newcomb Problem in which the outcomes are measured not in dollars but in consequentialist utilities. As before, assume the agent facing the problem has subjective utilities that match the consequentialist utilities.

It is clear what the agent ought to do, from the perspective of objective consequentialism: she ought to take both boxes. (Recall that the 'ought' of objective consequentialism evaluates acts causally, by looking at the outcomes the acts would bring about, given all relevant facts about the world -- known and unknown. One relevant fact is the content of the opaque box. If the opaque box is in fact empty, then one-boxing would lead to zero consequentialist utilities and two-boxing to a thousand; if the opaque box is non-empty, then one-boxing would lead to 1 million utilities and two-boxing to 1 million and 1 thousand. Either way, two-boxing would lead to the better state.)

Now here we have an agent with perfectly consequentialist values
who *knows* that she ought to two-box, in the objective
sense. Yet evidential decision theory says it would be irrational for
her to two-box! That's not a logical contradiction. But it surely
sounds unappealing. It would be better to have a decision theory on
which it can't happen that a morally perfect agent is irrational for
choosing an act of which she knows that she morally ought to choose
it.

The argument generalizes. For one thing, it generalizes beyond evidential decision theory to other decision theories that recommend one-boxing, such as ""timeless decision theory", ""disposition-based decision theory", "Spohn's recent spin on causal decision theory, and whatever decision theory Teddy Seidenfeld thinks is right.

The argument also generalizes beyond objective consequentialism, given that almost every (sensible) moral theory can be consequentialised. In general, if you think the notion of an objective moral ought is coherent, you probably shouldn't say that one-boxing is the rational choice in Newcomb's Problem.

]]>