Sher on the weight of reasons
A few thoughts on Sher (2019), which I found advertised in Nair (2021).
This (long and rich) paper presents a formal model of reasons and their weight, with the aim of clarifying how different reasons for or against an act combine.
Sher's guiding idea is to measure the weight by which a reason supports an act in terms of the effect that coming to know the reason would have on the act's desirability.
To spell this out, Sher assumes the popular framework of Savage (1954) in which the 'acts' are assumed to be independent of the 'states' (of the world), and probabilities are only assigned to states. Any combination of a state with an act determines an 'outcome' with a particular value. The expected value EV(A) of an act A is then the weighted average of the value of the different outcomes an act might bring about, weighted by the probability of the relevant state.
As a first pass, we might now implement Sher's guiding idea by defining the weight by which a reason R supports an act A as the extent to which conditioning on R increases the expected value of A:
(1) w(R,A) = EV(A/R) - EV(A).
Here, EV(A/R) is the weighted average of the value of the possible A-outcomes, weighted by the probability of the relevant state conditional on R.
But (1) won't do. Sher gives the following counterexample. Let R be the proposition that I will win the lottery. Let A be some ordinary act, like getting a coffee. The "outcome" of performing A in a state where R is true is that I get a coffee and have won the lottery. The outcome of performing A in a not-R state is that I get a coffee and haven't won the lotteries. On balance, the first kind of outcome is better than the second. So EV(A/R) is greater than EV(A). But the hypothesis (or fact, if it is a fact) that I will win the lottery is not a reason for me to get a coffee.
Sher gets around this "lottery problem" by assuming a contrastive conception of reasons: a reason for A is always a reason for A as opposed to some alternative B.
We can measure the comparative expected value of A as opposed to B by the expected difference between the value of A and the value of B:
EV(A vs B) = ∑s Pr(s) [V(A in s) - V(B in s)].
Sher now defines the weight by which a reason R supports an act A as opposed to an alternative B as the extent to which conditioning on R increases the comparative expected value of A vs B:
(2) w(R, A vs B) = EV(A vs B/R) - EV(A vs B).
In the lottery example, let A be getting a coffee and B staying in my office. The expected difference between the value of A and B is plausibly independent of whether I will win the lottery. So the lottery problem is avoided.
Having defined the weight of reason by (2), Sher investigates how different reasons combine. Let's say that two reasons R1 and R2 are independent with respect to A vs B iff w(R1, A vs B) is unaffected by conditioning on R2, and vice versa. Sher proves that if two reasons are in this sense independent, then the weight of their conjunction is the sum of their individual weights. If reasons are not independent, they combine in a way that resembles the chain rule of probability.
All this works out relatively nicely. But I haven't yet explained how the probability measure Pr implicit in (2) is supposed to be interpreted.
One might assume that it is the credence function of the relevant subject. But this would have the unfortunate consequence that propositions of which the subject is already confident can't be strong reasons for or against any act. I mentioned this as a problem for the account of Nair (2021) in a recent post. There I suggested that it would be better to use an evidential probability measure.
Sher doesn't consider this option. Instead, he suggests that Pr is the subject's "deliberative probability". He seems to assume that during deliberation, we temporarily suspend all our information about the world. This seems bizarre.
It gets worse. Sher intuits that the weight of a reason R in favour of an act A (vs some alternative) doesn't actually depend on how likely it is that R is true, in any conventional sense of 'likely', even though his model implies that it depends on Pr(R).
To illustrate this point, let R be the proposition that murder is wrong, and let A be an act of murder. Sher intuits that R is a very strong reason against A: w(R, A vs B) is a large negative number (assuming that B is a harmless alternative). But the magnitude of w(R, A vs B) can only be large if Pr(R) is comparatively low. (Recall that highly probable propositions can't be strong reasons.) Sher concludes that the deliberative probability of the hypothesis that murder is wrong must be "very close to zero" (p.123).
At this point, I'm lost. Even if we assume, bizarrely, that during deliberation we suspend all our information about the world, why should this put us in a state in which we're almost certain that murder is OK? I would have thought that an agent's deliberative probabilities are independent of their values, or of the values we have in mind when we talk about reasons. According to Sher, however, the "deliberative probabilities" must be retrofitted so as to yield the desired judgements about reasons. If we judge that R is a strong reason for or against some act, then R's deliberative probability is low. If we judge that it is at most a weak reason, then its deliberative probability may be higher.
So we don't really get an informative analysis of reasons and their weight in terms of independently understood value and probability functions. The probability function Pr that defines the expectations in (2) is really a primitive element of the model. This makes the model hard to use and assess. For example, it is unclear how we are meant to judge if two reasons are independent. Without any prior grip on Pr, how can we know whether the quantity in (2) changes if Pr is conditioned on R2?
Sher does mention, in section 7, that one might want to take weight as primitive. He lists some axioms on the (primitive) weight function w that allow deriving an implicit value function V and a probability measure Pr so that equation (2) is satisfied. This is technically interesting, although much of the work seems to be done by the implausible first axiom, according to which distinct reasons can never have the same weight. I would also have thought that taking weight as fundamental wouldn't really mean taking the numerical weight function as primitive. Surely the precise numbers that figure as weights are to a large extent conventional. It would be better to start with a comparative weight relation.
Again, however, this picture sheds little light on how different reasons combine, as it offers (for example) no direct way of checking whether two reasons are independent.
Would it help to read Pr as evidential probability?
This would make (2) and its applications more useful and informative. But I agree that it would have counterintuitive consequences, as it implies that the evidential probability of a proposition constrains how strongly it may qualify as a reason for or against any act: only improbable propositions can very strong reasons.
If this seems wrong, we should probably conclude that Sher's guiding idea is mistaken: we can't understand the weight by which a reason supports an act in terms of the effect that coming to know the reason would have on the act's desirability. Talk about 'coming to know' is only appropriate if the Pr measure has some doxastic or epistemic interpretation.
The counterintuitive consequences also arise for conditional weights, and this casts further doubts on Sher's model. Consider a case where R2 is strong evidence for R1. Let's say that R1 is 'murder is wrong' and R2 is 'either murder is wrong or the same number will be drawn in next 10 lottery draws'. Suppose R2 is true. In light of this, would you think R1 is a strong reason against committing murder? I'd say it is. Assuming R2 doesn't really change the fact that R1 is a strong reason against committing murder. But surely R2 increases the probability of R1, no matter how we interpret the measure Pr. (Well, you could come up with a measure relative to which R2 and R1 are independent, but I bet that we could find other examples with the same structure.) It follows that R1 and R2 are not independent, according to Sher's analysis, even though he assumes that some reasons are independent "precisely when the weight of a reason is unaffected by other reasons" (p.128)!
Something has gone wrong. Conditionalising on one reason can affect the weight of another reason, as defined by (2), simply by making the other reason more or less probable. But if we take seriously Sher's intuitions about the murder example then the real weight of a reason need not be sensitive to this change in probability.
Overall, Sher's model doesn't look all that promising to me.
Let's back up to the beginning of this discussion. I mentioned at the outset that Sher adopts a Savage-style conception of acts on which every act is compatible with every possible state of the world. An act, on this conception, carries little information about the world. Going to the pub, for example, doesn't qualify as an act if there are possible states of the world in which you don't go the pub.
Sher's analysis also implies that any proposition that is entailed by an act A is not a reason for or against A. If A is murdering Jones, for example, then the fact that A involves committing a murder can be no reason against A.
All this makes it a little mysterious what Sher's "acts" might be.
Suppose we swap Savage's model of acts for Jeffrey's (in Jeffrey (1965)). Jeffrey assumes that acts are simply propositions, on a par with the "states". This model isn't popular among economists, but it is conceptually clearer. The acts can be as descriptively rich as we want. Besides, something like Jeffrey's model is required if we want to study the dynamics of deliberation (see e.g. Skyrms (1990)) or reason carefully about Newcomb problems (see Joyce (1999, 177ff.).
In Jeffrey's framework, we also don't need separate "outcomes" as bearers of values. We can simply assume that we have an "intrinsic value" measure that assigns a numerical value to very specific propositions ("worlds"), and a probability measure Pr over all propositions. From this, the value V of any proposition with positive probability can be computed as the probability-weighted average of the value of the worlds in which the proposition is true, conditional on that proposition. Let's also adopt the common convention that the tautology has value 0.
Now return to the lottery problem. The problem, if you remember, was that the proposition that I will win the lottery strongly increases the expected value of getting a coffee, although it is no reason in favour of this act. Sher's response was to adopt a contrastive account of reasons. This may not be seen as a problem. One might argue that reasons really are contrastive: that I'm hungry is a reason to eat bread vs eating nothing, but it's not a reason to eat bread vs pasta. But I don't think it is generally accepted that reasons are contrastive. We seem to also have a non-contrastive concept of reasons on which being hungry simply is a reason to eat bread, fullstop.
Jeffrey's framework allows for a different response to the lottery problem.
Let's assume that we still want to analyse the weight of a reason R in terms of something like the effect that coming to know R would have on the desirability of the relevant act A. The desirability of A is V(A). The desirability of A conditional on R turns out to be V(A ∧ R). The lottery problem shows that we can't measure the weight of R in favour of A by the difference between V(A ∧ R) and V(A): in the lottery example, V(A ∧ R) is higher than V(A) even though R is no reason to perform A. We want to somehow subtract the independent value of R.
Well, why don't we do just that? Let's set
(3) w(R,A) = V(A ∧ R) - V(A) - V(R).
This looks a little strange at first. Let's rewrite it as follows:
(3) w(R,A) = [V(A ∧ R) - V(R)] - V(A).
Informally speaking, the first term, V(A ∧ R) - V(R), measures the degree to which performing A is expected to improve things, given that R is true. The second term, V(A), measures the degree to which performing A is expected to improve things, without any further assumptions.
Consider R = 'I am hungry' and A = 'I eat some bread'. Relative to an evidential probability measure, V(A) is not particularly high. If you had no idea about my state of hunger etc., you probably wouldn't recommend that I eat some bread. If hunger is bad, then V(R) is negative. The R worlds divide into worlds where I am hungry and eat nothing, worlds where I am hungry and eat bread, and worlds where I am hungry and eat something else. All three kinds of worlds have positive evidential probability. Let's assume that the first is worse than the second, which is on a par with the third. Then V(A ∧ R) - V(R) is positive. And so R is a reason for A.
I'm not sure if this is ultimately a good model of reasons. But it might be worth exploring. (Perhaps it would be even better to define w(R,A) as [V(A ∧ R) - V(R)] - [V(A ∧ ¬R) - V(¬R)], to get around the problem that improbable propositions tend to be stronger reasons than probable propositions.)
One problem with (3) is that it is purely "evidential". Let A be smoking cigarettes, and R the information that (i) there is a gene that causes a desire to smoke cigarettes and also causes cancer, (ii) smoking has no causal influence on cancer, and (iii) smoking is pleasant. Surely R is no reason against smoking. Even friends of EDT should agree. According to (3), it is. The problem could be fixed by switching the "indicative" value measure V by a "subjunctive" measure so that V(A) is the average of the value of the A-worlds, weighted by the probability of the worlds imaged on A (rather than conditionalized on A).
The evidentialist problem might also arise in Sher's model, depending on the precise way in which the states are assumed to be independent of the acts. Here, the problem could be fixed by requiring some kind of causal independence.