Suppose the laws of nature are deterministic. What would have happened if you had chosen some act that you didn't actually choose? The two apparently incompatible intuitions are:

(A1) Had you chosen differently, no law of nature would have been violated.

(A2) Had you chosen differently, the initial conditions of the universe would not have been changed.

Rejecting one of these intuitions is widely thought to spell trouble for Causal Decision Theory. Gallow argues that they can both be respected. I'll explain how. Then I'll explain why I'm not convinced.

We start with some familiar ideas from causal modelling. To begin, we assume that the world contains relations of causal influence between variables. Here, a *variable* isn't a syntactic object, but a "contrastive generalisation of an event". For our topic, we can focus on binary variables that take only two values, 0 and 1. For example, we'll talk about a variable B that represents whether or not you take a certain bet. If you do, we say that B has value 1, if you don't, it has value 0.

Relations of causal influence can be represented by *structural equations* that specify how the value of one variable is determined by the value of other variables. For example, if B is a bet that a certain coin will land heads, H is a variable for the outcome of the coin flip (1 = Heads, 0 = Tails), and W is a variable for getting the relevant amount of money ("winning"), then the equation

(1) W := B * H

represents that whether you get the money is determined by whether you bet and by how the coin lands, in multiplicative fashion (so that W = 1 iff B and H are both 1).

What does it take for an equation like this to be a *correct* representation of the causal structure in the world? Gallow lists three conditions.

First, all the variables must be mereologically distinct, so that all combinations of value assignments are possible.

Second, for any possible values of the variables on the right-hand side of the equation (B and H in my example), the equality (W = B * H) holds at the closest worlds at which the variables on the right-hand side have these values. Closeness is measured by intuitive overall similarity at the time of B and H.

To see what this means, let's continue the example. Suppose you haven't actually bet (B=0) and the coin has landed heads (H=1). We then need to check, for example, whether you win (W=1) in the closest worlds at which you bet (B=1) and the coin lands heads (H=1). The answer is yes. In general, no matter how we set W and B, the equation W = B * H holds (in the closest worlds where W and B have these values).

A third condition pertains to entire systems of structural equations. For systems that contain only a single equation, it says that there is no non-trivial dependence (of the sort that would meet the second condition) between the variables on the right-hand side.

In the coin example, there is no non-trivial dependence between B and H. So (1) is a correct representation of the coin flip scenario.

We need one more piece of machinery to connect all this to conditionals.

Let a *causal model* be a system of structural equations together with an assignment of values to those ("exogenous") variables that only occur on the right-hand side of the equations. A model is *correct* if the system of equations is correct (as per the three criteria above) and the values assigned to the exogenous variables are their true values. Finally, if M is a model and V=v is an assignment of value v to some variable V, then M *revised by V=v* is a model much like M except that V has value v and all equations in which V occurs on the left-hand side are removed.

Now, according to Gallow, a counterfactual conditional A=a > C=c is true (on its "causal" reading) whenever there is a correct causal model that determines C=c when revised by A=a.

In the coin example, the equation W := B * H together with the assignment { B=0, H=1 } is a correct causal model. To check whether B=1 > W=1 ("If you had bet on heads, you would have won"), we consider the revised model that has the same equation but assignment { B=1, H=1 }. Applying the equation, this determines W=1. So the conditional is true.

All this is nice and interesting and useful, and spelled out with more care than at many other places where the same ideas have been discussed. But let's see how it helps with (A1) and (A2).

Imagine a situation in which you have the option of betting that the actual laws of nature are violated. You're not going to choose this option. We wonder what would happen if you did. Would you win the bet? Intuitively, the answer is no. Your choosing otherwise would not have brought about a violation of the laws, as per (A1).

Gallow models the relevant causal relations as follows. We have three variables. B for whether you bet, W for whether you get the payoff, and M ("miracle") for whether the actual laws are violated. These variables are related by the equation

(2) W := B * M.

We can check that this is a correct system of equations by Gallow's three conditions. First, the three variables are distinct, at least insofar as any combination of values is possible. Second, the closest worlds at which B=1 or M=1 (or both) are still worlds at which you get the money iff B=1 and M=1. Finally, there is no dependence between B and M.

Wait. Why isn't there a dependence between B and M? In the actual world, we have B=0 and M=0. According to Lewis (1979), the closest B=1 worlds are M=1 worlds. (The closest worlds where you bet have a "miracle".) This suggests that B and M are related by equation (3), which would make (2) an incorrect system.

(3) M := B.

We could, of course, deny that the closest B=1 worlds are M=1 worlds, but then we'd run into the analogous problem with the past and (A2). So let's assume that the closest B=1 worlds are M=1 worlds, if only for the sake of the argument.

Gallow argues that (3) is incorrect even if the closest B=1 worlds are M=1 worlds, because the closest B=0 worlds aren't all M=0 worlds: they include worlds at which the (actual) laws of nature are violated at around the time of the bet.

This makes no sense if closeness is a matter of intuitive similarity at the relevant time (as we are told on p.14). The actual world is surely more similar to itself than any world in which the actual laws are violated at the time of the bet.

So closeness doesn't measure intuitive similarity. It's a technical concept that measures something else. I'm not sure what it measures. Gallow doesn't really explain. He describes a different example involving Jesus that he thinks motivates the idea that the actual world is not the most similar world to itself. I don't really follow his intuitions here. The idea seems to be that if J is a variable for whether Jesus was born then an equation like D := J means that D=1 is guaranteed *no matter how* J=1 is realised. Even if J actually has value 1, we therefore need to check that D=1 is the case at nearby worlds at which J=1 is realised in some other way. OK.

Perhaps *B=0* can also be realised in different ways. On Gallow's reading, equation (3) then requires that at nearby worlds at which B=0 is realised in some other way, the actual laws obtain without violation. On a Lewisian conception of closeness, these worlds all involve violations of the laws. So (3) is false. And so (2) is correct. Here it is again.

(2) W := B * M.

Now it's easy to see why 'B=1 > W=0' ("if you were to bet that the laws are violated you wouldn't win") is true. The actual values of B and M are B=0 and M=0. To evaluate the conditional, we set B=1, leaving M at 0, and plug these values into equation (2). We get W=0.

The same reasoning applies for the past. If C describes the initial conditions of the universe and B is a bet that the initial conditions are not C, then (4) is a correct system of equations:

(4) W := B * (1-C).

We get the desired result that if you had bet that the initial conditions are different then you would have lost, because the initial conditions would not have been changed, as per (A2).

If we're only interested in (A1) and (A2), we can actually use simpler models.

Let M1 be a model with variables B and M, an *empty* system of equations, and the assignment { B=0, M=0 }. This model is correct by Gallow's criteria. Revising M1 by B=1 ("you choose otherwise") yields a model with M=0. So we have 'B=1 > M=0' ("the laws would not be violated").

Similarly, let M2 be a model with B and C, an empty system of equations, and assignment { B=0, C=1 }. Revising this correct model by B=1 ("you choose otherwise") determines C=1 ("the initial conditions would be the same").

But how could the deterministic laws and the initial conditions both be the same, if you had chosen a different act?! Would an impossible situation have obtained?

According to Gallow, we're not licensed to conclude that *the laws and the initial conditions would both have been the same*. The "agglomeration" rule for counterfactuals fails.

What if we try to build a model to check whether 'B=1 > (M=0 ∧ C=1)' is true? We can't. The three variables B, M, and C are not independent. M=0 and C=1 entail B=0. Any system with these three variables violates the first condition for correct systems: that the variables must be "distinct". Gallow's semantics therefore doesn't tell us whether 'B=1 > (M=0 ∧ C=1)' is true.

The distinctness condition has an independent motivation. We don't want to say that your playing poker causally determines that you play cards. But if we could use one variable P for poker and another C for cards, then the equation C := P would be a correct system. We need the distinctness condition to rule it out.

So much for my summary. I have eleven worries/objections.

*One*. As Gallow points out, the coin scenario looks like the famous Morgenbesser case. It is widely thought that to vindicate the intuition that B=1 > W=1 ("if you had bet on heads you would have won"), one must put explicitly causal notions into the analysis of the conditional. Gallow gets the intuition right without invoking any casual notions. This looks like an advantage. But I think it's a problem. The Morgenbesser intuition depends on the extent to which the betting is isolated from the coin flip. Take an extreme case of non-isolation: if you bet on heads, the coin is flipped gently with the left hand; if you don't, it is flipped vigorously with the right hand. As before, you don't bet, the coin is flipped (vigorously with the right hand) and lands heads. If you had bet on heads, would you have won? This is far from clear. I'd say no. There would have been an entirely different coin flip. Who knows how it would have landed? But the system composed only of equation (1) is still correct by Gallow's standards. And the exogenous variables still have values { B=0, H=1 }. Revising this model by B=1 still yields W=1. So 'B=1 > W=1' comes out true, in my version of the story with extreme non-isolation. That's the wrong result.

*Two*. It is crucial to Gallow's account that there is more than one closest B=0 world, even if B=0 is actually true. I already mentioned that I don't really see an independent motivation for this assumption. If the idea is that B=0 can be realised in different ways, and that the equations should be robust across these ways, then OK, but that's not enough. What if *B=0* is a very precise description of a bet that can't be realised in relevantly different ways? We don't want to say that if you had taken the bet then there would have been no law violations but if you had taken the bet *in such-and-such specific way* then there would have been law violations.

*Three*. Interventionist accounts tend to neglect the need for a "ramp". An example from Bennett (2003): At time t, a dam bursts. Several cars on the valley road are swept away. Some drivers and passengers die. What would have happened if there had been no cars on the road at t? Nobody would have died. (Let's say.) Would there have been an inquiry into the mystery of the disappearing cars? Intuitively, no. We can construct a model with variables for E = whether cars enter the road at t-1, R = whether there are cars on the road at t, I = whether there is an inquiry into the mystery of disappearing cars. The equation I := E * (1-R) is plausibly correct. (This is so even if our closeness standards require a ramp, so that the closest R=0 worlds are E=0 worlds.) Intervening with R=0 therefore yields I=1. But 'R=0 > I=1' is intuitively false.

*Four*. Continuing the valley road example, if our closeness standards require a ramp, so that cars don't magically appear or disappear at nearby worlds, then the equation E := R comes out correct. But it has the wrong direction. (Lewis (1981a) tried to get around this by hoping that ramp "events" like E would be too disjunctive to count as genuine events. But (a) this depends on the details of the case, and (b) Gallow doesn't have a rule that events/variables can't be disjunctive.) If our closeness standards don't have a ramp, even the equation I := 1-R is correct. Intuitively, neither E := R nor I := 1-R correctly represents the causal structure of the world.

*Five*. "If you had played poker you would have played cards" is true. It may not be a strictly "causal" counterfactual, but it's the kind of counterfactual that could well be relevant to CDT. (Imagine you assign basic value to playing cards and wonder whether to play poker or go for a walk.) We need a theory of counterfactuals that doesn't just cover conditionals between distinct variables.

*Six*. Is M actually distinct from B, as required for the correctness of equation (2)? Gallow doesn't formally define distinctness. He suggests that we can model variables as classes of Lewisian events, each of which is a class of possible spacetime regions (see Lewis (1986)). On this understanding, M=0 comprises all of spacetime in all worlds where the actual laws aren't violated, and M=1 comprises all other regions in all worlds. Wherever B=0 and B=1 happen, they're part of the M=0 region or the M=1 region. Without further constraints on events, B=0 and M=0 definitely aren't distinct by the somewhat complicated criteria from Lewis (1986, 258–60).

*Seven*. The distinctness constraint isn't enough. Let 'Xanthippe's widowing' be the class of possible spacetime regions that are fully occupied by Xanthippe and simultaneous to Socrates's death. This "event" is distinct from Socrates's death. And it "occurs" whenever Socrates dies. If we allow a variable X for Xanthippe's widowing and a variable S for Socrates dying, the equation S := X is correct. (As is X := S.) But there's no causal relationship between S and X. (This problem was raised by Kim (1973) against Lewis (1973).) I worry that whatever constraint rules out equations with S and X (Lewis (1981b) suggested that X is too gerrymandered to count as an event) will also rule out equations with B and M.

*Eight*. This is really just a feeling of uneasiness. M doesn't look like the kind of variable that should figure in a causal model. Causal models are useful to represent relations of causal influence between concrete and local types of events: whether a brake was released, how fast a ball was thrown, etc. I wouldn't expect that one could have, say, a variable for *whether the equation W := B * H is correct* in a causal model. But M is just like that.

*Nine*. Suppose the deterministic laws of nature have a certain parametric form L(x). There are only two possible parameter values, 0 and 1. The true value is 1. Let L state that there are unviolated laws of the form L(x). B is a bet that a certain event in the past (say, the holocaust) never happened. C are the initial conditions. You are sure that the event happened, so you don't choose B. This means that L(1) and C together entail ¬B. L(0) and C together entail that the history of the universe takes a different path. Let's say it leads to a history in which the event didn't happen and you now choose B. We can model the structure of this scenario with variables for L, B, C, and W (for winning the bet). All combinations of values are possible, because L doesn't settle whether the laws are L(1) or L(0). The equation W := C * L * B is correct. Since we have C=1 and L=1, setting B=1 yields W=1. On Gallow's account, we have to conclude that if you had bet that the holocaust didn't happen, then the holocaust would not have happened. (In general, both the laws and the past seem to depend on our present actions if we look at models with suitably weakened versions of M and C.)

*Ten*. I would have thought that if there are two correct but partial models of the world's causal structure, then they can be combined into a larger correct model. According to Gallow, M1 and M2 can't be combined in this way. Is the total causal structure of the world unrepresentable?

*Eleven*. Suppose I'm confident that there are no violations of the actual laws (M=0) and that the initial conditions are C (C=1). Someone offers me a deal that pays $1 if M=0 and $-1 if M=1. Gallow says that I should accept. Then someone offers me a deal that pays $1 if C=1 and $-1 if C=0. Again Gallow says that I should accept. But if someone offers me deal that pays $2 if M=0 and C=1, $0 if M=1 and C=0, and $1 otherwise, then Gallow says that it's unclear what I should do because the situation can't be modelled. But isn't the last bet equivalent to the first two? How could it be rationally required to accept the first two and not the last?

Bennett, Jonathan. 2003. *A Philosophical Guide to Conditionals*. New York: Oxford University Press.

Gallow, J. Dmitri. 2023. “Causal Counterfactuals Without Miracles or Backtracking.” *Philosophy and Phenomenological Research* n/a (n/a). doi.org/10.1111/phpr.12925.

Kim, Jaegwon. 1973. “Causes and Counterfactuals.” *Journal of Philosophy* 70: 570–72.

Lewis, David. 1973. “Causation.” *Journal of Philosophy* 70: 556–67.

Lewis, David. 1979. “Counterfactual Dependence and Time’s Arrow.” *Noûs* 13: 455–76.

Lewis, David. 1981a. “Are We Free to Break the Laws?” *Theoria* 47: 113–21.

Lewis, David. 1981b. “Nachwort (1978) Zu ‘Kausalität’.”

Lewis, David. 1986. “Events.” In *Philosophical Papers, Vol 2*, 241–69. New York: Oxford University Press.

The paper investigates Al Hajek's argument (e.g. in Hájek (2021)) that "chance undermines would". It begins with a neat observation.

One way of putting Hajek's argument goes like this. Imagine a chancy coin. We all agree that (1) is true.

(1) If the coin were flipped, there would be a chance of tails.

From this, one might be tempted to infer (2):

(2) If the coin were flipped, it could land tails.

And arguably, (2) is incompatible with (3):

(3) If the coin were flipped, it would land heads.

Kocurek notes that this line of reasoning leads to trouble if we consider a stronger antecedent:

(1') If the coin were flipped and then landed heads, there would be a chance of tails.

This seems true. By the same reasoning that leads from (1) to (2) – from 'there would be a chance of tails' to 'the coin could land tails' –, we could infer (2'):

(2') If the coin were flipped and then landed heads, it could land tails.

If 'could tails' is incompatible with 'would heads', we could infer that (3') is false.

(3') If the coin were flipped and then landed heads, it would land heads.

But (3') is obviously true!

To me, this settles that 'there would be a chance of A' is compatible with 'would not A'. If that's what the "chance undermines would" thesis denies, then the thesis has been refuted.

But I still feel the pull of "chance undermines would". If there's a chance of tails, it's not true that the coin would land heads!

Hajek should say that (1) isn't the right premise about chance. What undermines 'would' is not that there *would be a chance*. What undermines 'would' is the *actual* chanciness of the coin. There *is* a chance of tails, conditional on flip. There is no chance of tails conditional on flip ∧ heads.

The right premise (1) in Hajek's argument should be something like (1H):

(1H) There is a chance that the coin lands tails if flipped.

Intuitively, this entails (2). (At least if we add the premise that the coin isn't flipped.) And (2) seems incompatible with (3). If we replace 'flipped' with 'flipped and heads' then the first premise becomes obviously false.

So that's a good lesson about how to spell out the "chance undermines would" argument.

Kocurek doesn't consider revising premise (1). Instead, he immediately descends into the rabbit hole of formal semantics.

He also conjectures that the intuition behind "chance undermines would" is a more general intuition that 'there is a chance of A' conflicts with 'not A', as witnessed by the oddness of (9).

(9) ??The coin will land heads and there's a chance it won't.

Kocurek spells out a semantics that explains this oddness and makes 'there would be a chance of A' incompatible with 'would not A', despite the apparent counterexample from (1')-(3').

The semantics is modeled on information-sensitive accounts of epistemic modals. Truth is defined relative to a world and a chance function. Two notions of logical consequence are distinguished. I won't go through the details, mainly because I don't see a good motivation for the exercise.

In light of the counterexample from (1')-(3'), any semantics that makes 'there would be a chance of A' incompatible with 'would not A' is going to have dubious consequences. Even Hajek should reject the incompatibility. When it comes to (9), we don't need a special semantics to explain the oddness. If you know that there's a chance that the coin will land tails then you arguably can't know that the coin will land heads (unless you have very unusual information). So (9) is unknowable and hence unassertable.

I also don't agree that the "chance undermines would" intuition is based on a general intuition about the incompatibility between chanciness and truth. For example, there's nothing odd at all about a past version of (9):

(9') The coin landed heads and there was a chance it wouldn't.

Kocurek ends up rejecting the original semantics he develops in favour of an alternative, "indeterminist semantics" that doesn't make 'there would be a chance of A' incompatible with 'would not A'. The indeterminist semantics still defines truth relative to a world and a chance function, and it still has two notions of entailment. I'm not sure exactly how it works because *Mind* has messed up a lot of the symbols.

An important new element of the new semantics is the assumption of "counterfacts": for every world w and (non-empty) proposition A, there is a unique world w' that would be the case if A were the case. The chance parameter in the definition of truth is only really used to interpret statements like (1): 'if A then the chance of B would have been x' is true relative to a world w and a chance function f iff f(B//A) = x, where f(*//A) results from f by moving the probability of every non-A world w to the unique world w' that would be the case at w if A were the case.

Little explanation is given for all these choices. What is the chance parameter supposed to represent? Do we have to believe in counterfacts? Why the imaging operation on the chance function? Do we have to believe that the physical chances are defined under these operations?

Besides, the semantics seems to render 'if the chance of heads were 0.9 then the chance of heads would be 0.9' contingent.

I think of proposals like Kocurek's as "proof-theoretic semantics".

The aim of proof-theoretic semantics is to predict certain inferential patterns. To this end, the semantics introduces models and semantic values relative to points of evaluation, but all this machinery is assessed merely by whether it determines a notion of entailment (or several such notions) that match the relevant inferential patterns.

*Real* semantics, by contrast, would try to spell out what has to be the case for a sentence to be true. Perhaps Kocurek's semantics can be understood as real semantics. But I don't know how this would work. The paper doesn't explain.

Hájek, Alan. 2021. “Counterfactual Scepticism and Antecedent-Contextualism.” *Synthese* 199 (1): 637–59. doi.org/10.1007/s11229-020-02686-0.

Kocurek, Alexander W. 2022. “Does Chance Undermine Would?” *Mind* 131 (523): 747–85. doi.org/10.1093/mind/fzab055.

This (long and rich) paper presents a formal model of reasons and their weight, with the aim of clarifying how different reasons for or against an act combine.

Sher's guiding idea is to measure the weight by which a reason supports an act in terms of the effect that coming to know the reason would have on the act's desirability.

To spell this out, Sher assumes the popular framework of Savage (1954) in which the 'acts' are assumed to be independent of the 'states' (of the world), and probabilities are only assigned to states. Any combination of a state with an act determines an 'outcome' with a particular value. The *expected value* EV(A) of an act A is then the weighted average of the value of the different outcomes an act might bring about, weighted by the probability of the relevant state.

As a first pass, we might now implement Sher's guiding idea by defining the weight by which a reason R supports an act A as the extent to which conditioning on R increases the expected value of A:

(1) w(R,A) = EV(A/R) - EV(A).

Here, EV(A/R) is the weighted average of the value of the possible A-outcomes, weighted by the probability of the relevant state conditional on R.

But (1) won't do. Sher gives the following counterexample. Let R be the proposition that I will win the lottery. Let A be some ordinary act, like getting a coffee. The "outcome" of performing A in a state where R is true is that I get a coffee and have won the lottery. The outcome of performing A in a not-R state is that I get a coffee and haven't won the lotteries. On balance, the first kind of outcome is better than the second. So EV(A/R) is greater than EV(A). But the hypothesis (or fact, if it is a fact) that I will win the lottery is not a reason for me to get a coffee.

Sher gets around this "lottery problem" by assuming a contrastive conception of reasons: a reason for A is always a reason for A *as opposed to* some alternative B.

We can measure the *comparative expected value of A as opposed to B* by the expected difference between the value of A and the value of B:

EV(A vs B) = ∑

_{s}Pr(s) [V(A in s) - V(B in s)].

Sher now defines the weight by which a reason R supports an act A as opposed to an alternative B as the extent to which conditioning on R increases the comparative expected value of A vs B:

(2) w(R, A vs B) = EV(A vs B/R) - EV(A vs B).

In the lottery example, let A be getting a coffee and B staying in my office. The expected difference between the value of A and B is plausibly independent of whether I will win the lottery. So the lottery problem is avoided.

Having defined the weight of reason by (2), Sher investigates how different reasons combine. Let's say that two reasons R_{1} and R_{2} are *independent* with respect to A vs B iff w(R_{1}, A vs B) is unaffected by conditioning on R_{2}, and vice versa. Sher proves that if two reasons are in this sense independent, then the weight of their conjunction is the sum of their individual weights. If reasons are not independent, they combine in a way that resembles the chain rule of probability.

All this works out relatively nicely. But I haven't yet explained how the probability measure Pr implicit in (2) is supposed to be interpreted.

One might assume that it is the credence function of the relevant subject. But this would have the unfortunate consequence that propositions of which the subject is already confident can't be strong reasons for or against any act. I mentioned this as a problem for the account of Nair (2021) in a recent post. There I suggested that it would be better to use an evidential probability measure.

Sher doesn't consider this option. Instead, he suggests that Pr is the subject's "deliberative probability". He seems to assume that during deliberation, we temporarily suspend all our information about the world. This seems bizarre.

It gets worse. Sher intuits that the weight of a reason R in favour of an act A (vs some alternative) doesn't actually depend on how likely it is that R is true, in any conventional sense of 'likely', even though his model implies that it depends on Pr(R).

To illustrate this point, let R be the proposition that murder is wrong, and let A be an act of murder. Sher intuits that R is a very strong reason against A: w(R, A vs B) is a large negative number (assuming that B is a harmless alternative). But the magnitude of w(R, A vs B) can only be large if Pr(R) is comparatively low. (Recall that highly probable propositions can't be strong reasons.) Sher concludes that the deliberative probability of the hypothesis that murder is wrong must be "very close to zero" (p.123).

At this point, I'm lost. Even if we assume, bizarrely, that during deliberation we suspend all our information about the world, why should this put us in a state in which we're almost certain that murder is OK? I would have thought that an agent's deliberative probabilities are independent of their values, or of the values we have in mind when we talk about reasons. According to Sher, however, the "deliberative probabilities" must be retrofitted so as to yield the desired judgements about reasons. If we judge that R is a strong reason for or against some act, then R's deliberative probability is low. If we judge that it is at most a weak reason, then its deliberative probability may be higher.

So we don't really get an informative analysis of reasons and their weight in terms of independently understood value and probability functions. The probability function Pr that defines the expectations in (2) is really a primitive element of the model. This makes the model hard to use and assess. For example, it is unclear how we are meant to judge if two reasons are independent. Without any prior grip on Pr, how can we know whether the quantity in (2) changes if Pr is conditioned on R_{2}?

Sher does mention, in section 7, that one might want to take *weight* as primitive. He lists some axioms on the (primitive) weight function w that allow deriving an implicit value function V and a probability measure Pr so that equation (2) is satisfied. This is technically interesting, although much of the work seems to be done by the implausible first axiom, according to which distinct reasons can never have the same weight. I would also have thought that taking weight as fundamental wouldn't really mean taking the numerical weight function as primitive. Surely the precise numbers that figure as weights are to a large extent conventional. It would be better to start with a comparative weight relation.

Again, however, this picture sheds little light on how different reasons combine, as it offers (for example) no direct way of checking whether two reasons are independent.

Would it help to read Pr as evidential probability?

This would make (2) and its applications more useful and informative. But I agree that it would have counterintuitive consequences, as it implies that the evidential probability of a proposition constrains how strongly it may qualify as a reason for or against any act: only improbable propositions can very strong reasons.

If this seems wrong, we should probably conclude that Sher's guiding idea is mistaken: we can't understand the weight by which a reason supports an act in terms of the effect that *coming to know* the reason would have on the act's desirability. Talk about 'coming to know' is only appropriate if the Pr measure has some doxastic or epistemic interpretation.

The counterintuitive consequences also arise for conditional weights, and this casts further doubts on Sher's model. Consider a case where R_{2} is strong evidence for R_{1}. Let's say that R_{1} is 'murder is wrong' and R_{2} is 'either murder is wrong or the same number will be drawn in next 10 lottery draws'. Suppose R_{2} is true. In light of this, would you think R_{1} is a strong reason against committing murder? I'd say it is. Assuming R_{2} doesn't really change the fact that R_{1} is a strong reason against committing murder. But surely R_{2} increases the probability of R_{1}, no matter how we interpret the measure Pr. (Well, you could come up with a measure relative to which R_{2} and R_{1} are independent, but I bet that we could find other examples with the same structure.) It follows that R_{1} and R_{2} are *not* independent, according to Sher's analysis, even though he assumes that some reasons are independent "precisely when the weight of a reason is unaffected by other reasons" (p.128)!

Something has gone wrong. Conditionalising on one reason can affect the weight of another reason, as defined by (2), simply by making the other reason more or less probable. But if we take seriously Sher's intuitions about the murder example then the real weight of a reason need not be sensitive to this change in probability.

Overall, Sher's model doesn't look all that promising to me.

Let's back up to the beginning of this discussion. I mentioned at the outset that Sher adopts a Savage-style conception of acts on which every act is compatible with every possible state of the world. An act, on this conception, carries little information about the world. Going to the pub, for example, doesn't qualify as an act if there are possible states of the world in which you don't go the pub.

Sher's analysis also implies that any proposition that is entailed by an act A is not a reason for or against A. If A is *murdering Jones*, for example, then the fact that A involves committing a murder can be no reason against A.

All this makes it a little mysterious what Sher's "acts" might be.

Suppose we swap Savage's model of acts for Jeffrey's (in Jeffrey (1965)). Jeffrey assumes that acts are simply propositions, on a par with the "states". This model isn't popular among economists, but it is conceptually clearer. The acts can be as descriptively rich as we want. Besides, something like Jeffrey's model is required if we want to study the dynamics of deliberation (see e.g. Skyrms (1990)) or reason carefully about Newcomb problems (see Joyce (1999, 177ff.).

In Jeffrey's framework, we also don't need separate "outcomes" as bearers of values. We can simply assume that we have an "intrinsic value" measure that assigns a numerical value to very specific propositions ("worlds"), and a probability measure Pr over all propositions. From this, the value V of any proposition with positive probability can be computed as the probability-weighted average of the value of the worlds in which the proposition is true, conditional on that proposition. Let's also adopt the common convention that the tautology has value 0.

Now return to the lottery problem. The problem, if you remember, was that the proposition that I will win the lottery strongly increases the expected value of getting a coffee, although it is no reason in favour of this act. Sher's response was to adopt a contrastive account of reasons. This may not be seen as a problem. One might argue that reasons really are contrastive: that I'm hungry is a reason to eat bread vs eating nothing, but it's not a reason to eat bread vs pasta. But I don't think it is generally accepted that reasons are contrastive. We seem to also have a non-contrastive concept of reasons on which being hungry simply is a reason to eat bread, fullstop.

Jeffrey's framework allows for a different response to the lottery problem.

Let's assume that we still want to analyse the weight of a reason R in terms of something like the effect that coming to know R would have on the desirability of the relevant act A. The desirability of A is V(A). The desirability of A conditional on R turns out to be V(A ∧ R). The lottery problem shows that we can't measure the weight of R in favour of A by the difference between V(A ∧ R) and V(A): in the lottery example, V(A ∧ R) is higher than V(A) even though R is no reason to perform A. We want to somehow subtract the independent value of R.

Well, why don't we do just that? Let's set

(3) w(R,A) = V(A ∧ R) - V(A) - V(R).

This looks a little strange at first. Let's rewrite it as follows:

(3) w(R,A) = [V(A ∧ R) - V(R)] - V(A).

Informally speaking, the first term, V(A ∧ R) - V(R), measures the degree to which performing A is expected to *improve* things, given that R is true. The second term, V(A), measures the degree to which performing A is expected to improve things, without any further assumptions.

Consider R = 'I am hungry' and A = 'I eat some bread'. Relative to an evidential probability measure, V(A) is not particularly high. If you had no idea about my state of hunger etc., you probably wouldn't recommend that I eat some bread. If hunger is bad, then V(R) is negative. The R worlds divide into worlds where I am hungry and eat nothing, worlds where I am hungry and eat bread, and worlds where I am hungry and eat something else. All three kinds of worlds have positive evidential probability. Let's assume that the first is worse than the second, which is on a par with the third. Then V(A ∧ R) - V(R) is positive. And so R is a reason for A.

I'm not sure if this is ultimately a good model of reasons. But it might be worth exploring. (Perhaps it would be even better to define w(R,A) as [V(A ∧ R) - V(R)] - [V(A ∧ ¬R) - V(¬R)], to get around the problem that improbable propositions tend to be stronger reasons than probable propositions.)

One problem with (3) is that it is purely "evidential". Let A be smoking cigarettes, and R the information that (i) there is a gene that causes a desire to smoke cigarettes and also causes cancer, (ii) smoking has no causal influence on cancer, and (iii) smoking is pleasant. Surely R is no reason against smoking. Even friends of EDT should agree. According to (3), it is. The problem could be fixed by switching the "indicative" value measure V by a "subjunctive" measure so that V(A) is the average of the value of the A-worlds, weighted by the probability of the worlds *imaged on A* (rather than conditionalized on A).

The evidentialist problem might also arise in Sher's model, depending on the precise way in which the states are assumed to be independent of the acts. Here, the problem could be fixed by requiring some kind of causal independence.

Jeffrey, Richard. 1965. *The Logic of Decision*. New York: McGraw-Hill.

Joyce, James. 1999. *The Foundations of Causal Decision Theory*. Cambridge: Cambridge University Press.

Nair, Shyam. 2021. “‘Adding Up’ Reasons: Lessons for Reductive and Nonreductive Approaches.” *Ethics* 132 (1): 38–88. doi.org/10.1086/715288.

Savage, Leonard. 1954. *The Foundations of Statistics*. New York. Wiley.

Sher, Itai. 2019. “Comparative Value and the Weight of Reasons.” *Economics & Philosophy* 35 (1): 103–58. doi.org/10.1017/S0266267118000160.

Skyrms, Brian. 1990. *The Dynamics of Rational Deliberation*. Cambridge (Mass.): Harvard University Press.

In Schwarz (2018), I put forward a tentative explanation of these facts. I argued that it would be useful for an agent in a world like ours to have a credence function defined over a space that includes special "imaginary" propositions that are causally tied to stimulations of their sense organs in such a way that any given stimulation makes the agent certain of a corresponding imaginary proposition. What we conceptualise as propositions about phenomenal properties (of our experience), I argued, might be such imaginary propositions.

Kammerer (2021) argues that my model is inadequate because it predicts a kind of certainty that we don't actually have. He offers three kinds of counterexamples.

First, people are sometimes mistaken about what they feel: a sudden cold sensation, for example, can be mistaken for pain.

Second, people mistakenly believe that their visual field is more detailed than it is.

Third, some people hold that there are no phenomenal experiences at all.

Let's start with the third. Presumably the idea is that if we are certain that we have a particular phenomenal experience X then we couldn't coherently believe that we have no phenomenal experiences at all. But it's not part of my model that the imaginary propositions of which the agent becomes certain are conceptualised as "phenomenal experiences". If you ask an eliminativist or illusionist what they mean by 'phenomenal experience', they will give you an answer. They deny that we have experiences with such-and-such definitional properties. This theoretical claim is orthogonal to any imaginary proposition. I don't see a problem here.

How are the first two cases meant to go? The idea is that if we know that we can be mistaken about the phenomenal properties of our current experiences then we can't be certain about these properties.

Can we be mistaken about the phenomenal properties of our current experiences? Arguably, yes. But we should distinguish different kinds of mistake.

One kind of mistake concerns the classification or description of the relevant state. Suppose I mistake a cold sensation for a pain sensation. We might also consider a case in which I'm unsure (perhaps for a moment) whether I have a cold sensation or a pain sensation. What is going on here? Perhaps the case is analogous to a case in which I mistakenly classify gnocchi as a pasta dish, or in which I'm unsure about whether gnocchi are a kind of pasta, even though I know full well what gnocchi are made of, how they are eaten, etc. I'm simply wrong or unsure about how the category 'pasta' applies. This kind of case would be harmless for my proposal.

A similar kind of mistake or uncertainty can arise from the difficulty of extracting relevant information from a perceptual experience. When I'm looking at the arrangement of 14 pens on my desk, I can't immediately tell how many pens there are. I might mistakenly think there are 15, and I might be unsure about the number. I would then be similarly mistaken or unsure about my phenomenal experience, for my visual experience plausibly settles the number of pens: if the number of pens were different, I would have a different experience. Here I'm not wrong or unsure about how the category '14 pens' applies. But arguably my mistake or uncertainty still concerns the classification of the experience, rather than the experience itself.

What would uncertainty about the experience itself look like? It would mean that I can't rule out scenarios in which I have a different experience. Here is how I illustrated the relevant kind of certainty in my paper:

Our perceptual experiences do appear to convey a special kind of information that is more certain than our ordinary beliefs about the world. To illustrate, consider your present perceptual experience. Are there any possibilities you can conclusively rule out in virtue of having this experience? Don't think of this as an attitude towards a sentence. Rather, imagine different ways things could be and ask yourself whether any of them can be ruled out given your experience. For example, consider a scenario in which you are skiing – a normal skiing scenario, without systematic hallucinations, rewired brains, evil demons or the like. It could be a real situation from the past, if you ever went skiing. Your experiences in that situation are completely unlike your actual present experiences. (I trust you are not reading this paper while skiing.) In the skiing scenario, you see the snow-covered slopes ahead of you, feel the icy wind in your face, the ground passing under your skis, and so on. What is your credence that this situation is actual right now? Arguably zero. In general, when we have a given experience, it seems that we can rule out any situations in which we have a sufficiently different experience. That is why skeptical scenarios almost always hold fixed our experiences and only vary the rest of the world.

I don't know if Kammerer disagrees with any of this. It's this kind of certainty that I hope my model might explain.

I said 'sufficiently different'. The certainty intuition becomes weaker if we consider scenarios where our experiences are only slightly different. Can we conclusively rule out scenarios in which we feel, say, ever-so-slightly warmer than we actually feel? This isn't obvious. I don't think it's obviously false. But it's also not obviously true.

In conversation about these issues, an undergraduate student here at Edinburgh raised another interesting worry. Split-brain patients can have experiences in one side of their brain of which the other side is oblivious. Such a patient might reasonably say, with the left side of their brain, that they don't know if they are currently having a normal red experience – with the other half of their brain. Wouldn't they be right? If they are ignorant of their predicament, they might also mistakenly think that they are not having a normal red experience.

So perhaps there can be genuine uncertainty about current phenomenology: uncertainty about the details, or uncertainty about the parts, in a fragmented mind. The model I described in Schwarz (2018) can't explain this kind of uncertainty. But it is clear anyway that we don't conform to this model, if only because the model is computationally intractable. My real proposal is that we work *somewhat like* the ideal agents I have described, and that this might explain some puzzling facts about the special access we appear to have to our phenomenal states. It might explain, for example, why I think I can conclusively rule out scenarios in which I have a skiing-type experience in every fragment of my mind.

Kammerer has another objection. He argues that the "certainty model" can't explain our sense of *acquaintance* with the phenomenal character of experience. That I'm absolutely sure of some proposition doesn't entail that the proposition is presented to me in a direct and immediate manner.

I agree. I never said otherwise. I would hope that the model I described in Schwarz (2018) can also shed light on our sense of acquaintance. If we'd conform to my model, then stimulations of our sense organs would directly cause certainty in a corresponding imaginary proposition. We would come to be certain of something of which were previously uncertain. And we would do so without any kind of reasoning or inference. It would seem to us as if there are special kinds of facts that are directly *presented* or *revealed* to us when we have a perceptual experience.

Kammerer, François. 2021. “Certainty and Our Sense of Acquaintance with Experiences.” *Erkenntnis*. doi.org/10.1007/s10670-021-00488-5.

Schwarz, Wolfgang. 2018. “Imaginary Foundations.” *Ergo* 29: 764–89.