From a strictly utilitarian perspective, there is nothing to complain about this. But strict utilitarianism is a highly counterintuitive position. In fact, MacAskill himself rejects it when he says that it would not be OK to consume meat from factory farms and "offset" by donating to animal welfare organisations, even if the net result would be less animal suffering. I agree. Whether a course of action is right or wrong is not just a matter of the net difference it makes to the amount of suffering in the world. But then we also have to reconsider MacAskill's conclusions about carbon offsetting, fairtrade, and sweatshops.

Why is it wrong to buy meat from factory farms even if one donates to animal welfare? Arguably, the reason has something to do with the fact by giving money to factory farms, one is actively engaged in the large-scale torturing of animals. And being actively engaged in something that causes great harm is worse than failing to interfere with activities of other people that cause great harm. That's why it is worse to pay somebody to torture animals than to not pay somebody to stop torturing animals.

The same kind of reasoning can be applied to the case of carbon emissions. By causing excessive carbon emissions I am actively engaged in something that causes great (expected) harm. That's wrong, and the wrong can't be offset by (say) donating to the Anti-Malaria Foundation, thereby causing a comparable reduction of harm. The harm I'm causing arguably also can't be offset by donating to projects that cause a comparable reduction specifically of carbon emissions, although this case is harder. For if my net carbon impact is zero, then the sum of my activities causes no harm at all through climate change. But if my activities involve short-distance flights and heating my house at night, I'm still actively engaged in something that causes great harm. I'm inclined to say that's still wrong.

The case of sweatshops is tricky for similar reasons. Here it is often argued that buying from sweatshops is actually good because those who work in sweatshops are generally better off than they would have been otherwise. Nevertheless, sweatshop conditions are terrible, and it is plausibly wrong to employ people under such conditions. The fact that those people would otherwise have been even worse off doesn't make it right. If I rescue an animal from a factory farm and torture it a little less in my back yard (a little less than it was tortured before), I shouldn't expect moral praise. I'm still doing something wrong. So it is wrong to employ people under sweatshop conditions. And so it is also wrong to pay people to do that. On the other hand, raising the living standards of the poorest is good, even if the new standards are still deplorable. So by purchasing products from sweatshops, I am simultaneously doing something good and something bad, and I'm doing it to the very same people. (That's the analogy to carbon offsetting: I'm not causing harm to some people and doing good to others.) It is not at all obvious to me that the good outweighs the bad.

It is also not obvious to me that it doesn't. I'm not saying we should buy locally produced shirts and shoes rather than ones produced in sweatshops in Inida. (Clearly, not buying and donating would be the best option, but let's say we need these things.) But I think the question is hard.

What would be really nice is if we could buy shirts and shoes from Indian factories where workers are treated humanely. That would do considerable good and no harm, so the choice would be easy.

MacAskill briefly discusses this option, but focusses only on one example, which then turns out not to be an example at all: buying coffee with the Fairtrade certificate. MacAskill argues that this is really a waste of money, in part because virtually none of the extra cost reaches the poorest workers. If he is right, then obviously there is little reason to buy Fairtrade. But that only shows that the Fairtrade standards and their enforcement don't work: they don't ensure that the production of Fairtrade goods causes significantly less harm than the production of non-Fairtrade goods.

A similar problem arises for "organic" standards. It would be great if there was a label for agricultural products whose production does not involve destroying prime forests, displacing people, depleting soil, poisoning ground water, reducing biodiversity, exposing workers to toxic fumes, causing gratuitous suffering to animals, and so on. I'd happily pay extra for such products. Sadly, the major certificates for "organic" food, while advertised to have precisely this function, fail to enforce many of these criteria.

The popularity of organic and Fairtrade products suggests that a lot of people have value judgements similar to mine: they care about not being complicit in activities that cause great harm. Instead of alienating people with such values, the Effective Altruism movement could take them seriously and push for changes to the standards for Fairtrade and organic products, perhaps by supporting the establishment of new labels or by lobbying for changes to government-set standards, etc.

In fact, that would be worthwhile even on strictly utilitarian grounds: since many people care about consuming ethically (even at higher costs), giving them truly ethical options could have a significant net effect. And since the current standards are so obviously broken, improving them should not be intractable.

]]>How observations about frequencies confirm or disconfirm probabilistic models is well understood in Bayesian epistemology. The central assumption that does most of the work is the Principal Principle, which states that if a model assigns (objective) probability x to some outcomes, then conditional on the model, the outcomes have (subjective) probability x. It follows that models that assign higher probability to the observed outcomes receive a greater boost of subjective probability than models that assign lower probability to the outcomes.

But evidence for probabilistic models does not only come from observed frequencies. In the sciences, arguably the most important kind of evidence for probabilistic models are facts about the mechanisms that generate the relevant outcomes. And here it is much less clear how the confirmation works. (There must be literature on this. Any pointers would be welcome.)

Suppose I explain to you that the outcomes you've observed are generated by flipping a coin. I show you the coin, explain to you how it is flipped, etc. This should strongly increase your credence in the assumption that heads and tails have probability 1/2. It should do so even if you hadn't observed any outcomes at all. Intuitively, that's because you may realize that (a) the outcome is very sensitive to the initial conditions of the flip, (b) the dynamics of the process does not favour one outcome over the other, and (c) the initial conditions are unlikely to favour a particular outcome.

But how do these facts support the hypothesis (call it 'H') that the coin land heads with probability 1/2?

If a hypothesis is confirmed by evidence, then the hypothesis has to raise the probability of the evidence (perhaps together with background assumptions). In the easiest cases, the hypothesis simply entails the evidence. So our question becomes: how does H increase the probability of (a) and (b) and (c)?

Arguably not by the Principal Principle. The Principal Principle links the objective probability a model assigns to outcomes with the subjective probability of the outcomes given the model. But (a) and (b) and (c) are not propositions about outcomes. They are not the kinds of things to which H assigns a probability.

One might argue that there's an indirect route from (a) and (b) and (c) to H, via frequencies: (a) and (b) and (c) raise the subjective probability of getting a sequence of outcomes in which heads has a relative frequency of roughly 1/2, and that is something to which H assigns a probability.

I have two worries about this argument. The first is that it just doesn't seem to capture the way in which (a) and (b) and (c) support H. When I tell you how coin flips work, you wouldn't reason that the relative frequency of heads on many trials is likely to be around 1/2, and from that infer that the probability of heads is 1/2.

Moreover (second), what if I ensure you that the coin is tossed only once? It would still be reasonable to believe that the probability of heads is 1/2, but this time it is certain that the relative frequency of heads won't be 1/2.

Another possible explanation of how (a) and (b) and (c) confirm H goes as follows. You may think that when the coin is flipped, the exact initial conditions -- in particular, the coin's vertical and rotational velocity -- are to some extent a matter of chance. That is, you may assume that there's an objective probability distribution over initial conditions. If you also assume that this distribution is roughly bell-shaped and not concentrated on a very narrow range of initial conditions (in accordance with (c)), then it follows from (a) and (b) that the probability of each outcome is about 1/2.

On this account, the probability measure specified by H is effectively identified with the probability measure over initial conditions. Even if you know little about the latter, the dynamics of the flipping process guarantees that it must determine a probability of roughly 1/2 for heads. Thus, (a) and (b) and (c) confirm H because they entail H.

I'm not convinced by this explanation either. For one thing, it violates the autonomy of higher-level objective probabilities. It seems highly implausible to me that the objective probabilities in population models or genetics are identical to the objective probabilities of statistical mechanics, and it seems even more implausible that the statistical mechanics probabilities are the probabilities of quantum mechanics. In fact, the mechanistic evidence for statistical mechanics, as it is usually presented, assumes a deterministic microphysics. So I don't think the probabilities in models of coin tosses are identical to lower-level probabilities over exact initial conditions.

Moreover, it seems to me that (a) and (b) and (c) would support H even on a purely subjective reading of (c), on which it says that you give approximately equal credence to initial conditions that differ very slightly from one another. In that case, your knowledge of (a) and (b) entails that you should assign credence 1/2 to heads. By the Principal Principle, it then follows that you can't believe that the objective probability of heads is anything other than 1/2. But that can hardly be the full story of how (a) and (b) and (c) support H.

(Compare: if you knew the exact initial conditions, or if God informed you of the outcome, you could be certain how the coin will land on my next flip; but we can't infer from the Principal Principle that the objective probability of heads is not 1/2.)

Notice that if (c) is true on the subjective reading, and you have
the mechanistic information (a) and (b), then not only will your
credence in heads be 1/2; your credence will also be highly
*resilient*, in the sense of "Skyrms
1980. Resilience is invariance under conditionalisation. Given (a)
and (b) and (c), your credence in heads will not be swayed by further
information, say, about the weather, about the time at which the next
flip occurs, or about whether microphysics is deterministic.

Skyrms suggests that a probabilistic model is well-confirmed for an agent to the extent that the agent's corresponding degrees of belief are resilient. It's certainly true that accepting a probabilistic model goes along not only with aligning one's credences with the model's probabilities, but also with making those credences highly resilient. Arguably that is why well confirmed probabilistic models are almost never just confirmed by frequency data. Mere frequency information would not make our credence resilient.

The problem is how to make sense of this effect in the framework of Bayesian confirmation theory. Skyrms rejects the whole idea of objective (physical) probability. On his view, it is wrong to speak of our credence in probabilistic models and about how that credence changes in response to evidence. For Skyrms, to say that H is well confirmed is really to say that the subjective probability of heads being 1/2 is resilient.

Skyrms's solution is too radical for my taste. I'd like to think that probabilistic hypotheses are genuine hypotheses that can be tested and believed. But this makes it mysterious why confirmation of such hypotheses would go along with resilience of the corresponding degrees of belief: why is your credence in H proportional to the resilience of your credence in heads?

]]>Let's look at Newcomb's Problem. Here Savage and Lewis and Skyrms would distinguish two relevant causal hypotheses (two cells of the K partition): according to K1, the opaque box is empty, one-boxing yields $0, and two-boxing $1K; according to K2, the opaque box contains $1M, one-boxing yields $1M, and two-boxing $1M1K. We could shoehorn these hypotheses into hypotheses about objective probabilities in causal graphs. The two hypotheses would share the same causal structure, but K1 would give probability 1 to the opaque box being empty and K2 probability 1 to the opaque box containing $1M. But if these node values have probability 1, then they plausibly also have probability 1 conditional on different values of their ancestors. And that would make the graph violate not only the 'Faithfulness Condition' (that d-connected nodes must be correlated), but also the 'Minimality Condition', that no subgraph of a DAG satisfies the Causal Markov Condition. The Minimality Condition is widely taken as axiomatic for causal models.

To avoid the clash with Minimality, we'd have to say that in K1 the probability of the opaque box being empty non-trivially depends on its causal ancestor, the predictor's prediction, even though the probability of the opaque box being empty is 1. That's not entirely unreasonable. But now we arguably get a probabilistic dependence between the Newcomb agent's choice and the content of the box, which we don't want: one-boxing increases the probability of the predictor having predicted one-boxing, which increases the probability of the box containing $1M. To avoid this, we would have to say that the dependence is asymmetrical: conditional on the predictor having predicted one-boxing, both one-boxing and the box containing $1M are probable, bot not the other way round: conditional one-boxing, the predictor having predicted one-boxing is still low (in K1). Again, I don't think that's an entirely unreasonable thing to say, but it means we're now dealing with very unorthodox conditional probabilities that don't fit what's usually assumed in causal models. We're effectively building causal relations into the conditional probabilities. No surprise we then get a causal decision theory without also postulating interventions.

So if we want to use orthodox causal models as dependency hypotheses, we arguably have to model Newcomb's problem with a single dependency hypothesis. (At least if the predictor's success rate is known.)

But then it's hard to see how two-boxing could come out as rational on the interventionist account. The problem is that the Newcomb agent's decision can hardly be an error term in the causal graph, as the agent should be confident that whatever she actually decides to do has been predicted by the predictor.

So even Stern's 'Interventionalist Decision Theory' recommends acting on spurious correlations in Newcomb type problems. That makes me wonder how much IDT really gains over the Meek & Glymour theory which uses Jeffrey's definition of expected utility.

The upshot is that I'm even more reserved now about the prospects of employing causal models in decision theory. On any way of spelling out the resulting theory, it seems to recommend acting on merely evidential correlations under certain circumstances.

]]>First, all good examples of a posteriori necessities follow a priori from non-modal truths. For example, as Kripke pointed out, that his table could not have been made of ice follows a priori from the contingent, non-modal truth that the table is made of wood. Simply taking metaphysical modality as a primitive kind of modality would make a mystery of this fact.

Second, it is a well-known linguistic fact that there are two ways of evaluating a sentence at a possible context or scenario. In one sense, 'it is warmer than it is here' is false at every context; in another, the sentence is true at a context c iff it is warmer at c than at the actual utterance context @.

Developing the second observation leads to two-dimensional semantics, and further to the two-dimenational account of metaphysical modality. On that account, a sentence is a priori just in case it is true at all worlds when evaluated in the first way ("as actual"), and necessary just in case it is true at all worlds when evaluated in the second way ("as counterfactual"). The space of worlds is the same in either case.

The two kinds of evaluation come apart if and only if the sentence in question contains actually-dependent expressions such as indexicals ('here'), demonstratives ('this table'), or proper names. Evaluated as counterfactual, such expressions denote whatever plays a certain role in the actual world (the utterance context). This is why the empirical information one needs to know a posteriori necessities is ordinary non-modal information about the world.

The two-dimensionalist account assumes that all a posteriori necessities can be
explained away by the difference between evaluating as actual and as
counterfactual. Let's call those a posteriori necessities *tame*,
and any others *brute*. For example, if 'there is an omniscient
being' is an a posteriori necessity, it would be brute, since there is
plausibly no difference between evaluating the sentences at worlds
considered as actual and at worlds considered as counterfactual.

There are many reasons to be sceptical about brute necessites, besides the fact that there are no clear examples. Among other things, they would complicate our metaphysics; they would raise epistemic worries; and (my favourite reason) they would make the domain of metaphysical modality philosophically uninteresting. For example, it would be mysterious why anyone should care whether the mental supervenes on the physical within some brute sphere of "metaphysical" possibility.

Brute necessities are *almost* the same thing as what Chalmers
calls *strong* necessities. According to Chalmers, a strong
necessity is an a posteriori necessity that is true at all
metaphysically possible worlds when evaluated as actual. Every strong
necessity is brute, but not every brute necessity is strong.

To illustrate, consider the hypothesis that (G) is a strong necessity:

(G) Some omniscient being likes H2O.

That is, suppose that for reasons we can't explain from the armchair and through explorations into ordinary non-modal truths, every metaphysically possible world contains an omniscient being who likes H2O, and let's suppose 'H2O' picks out the same substance in worlds considered as actual and as counterfactual.

Next, assume that 'water' rigidly picks out whatever plays the water role in the actual world, and that H2O plays this role, so that water is necessarily H2O. If (G) is necessary and 'water=H2O' is necessary, then plausibly so is (G').

(G') Some omniscient being likes water.

In contrast to (G), (G') is not true at all metaphysically possible worlds considered as actual. For assume XYZ plays the water role at w, and the only omniscient being at w likes H2O but not XYZ. Then (G') is false at w considered as actual. (G') is not a strong necessity. But it is brute, for its necessity cannot be explained away along two-dimensional lines.

Now, (G') entails (G), so if there are no strong necessities, then there are also no brute necessities like (G'). So maybe it's enough to focus on strong necessities?

Arguably not. In the example, we get the strong-necessity (G) from (G') by replacing the actuality-dependent term 'water' with the non-actuality-dependent term 'H2O' which at any world w considered as actual picks out what 'water' denotes at w considered as counterfactual. But arguably we can't always find such a term.

Assume 'charge' rigidly denotes whatever microphysical quantity plays the physical role of charge, and that different quantities play that role at different worlds. (Not an uncontroversial assumption, but one that should be compatible with two-dimensionalism.) Then consider the hypothesis that (H') is a brute metaphysical necessity.

(H') Some omniscient being likes charge.

For the same reason for which (G') is not a strong necessity, (H') is not a strong necessity either. But this time, we have no "semantically neutral" way to pick out the quantity that plays the charge role, so we cannot convert (H') into a strong necessity (H).

(H) Some omniscient being likes xxx.

Moreover, it's not a coincidence that we don't have the required word. Try to introduce a word that would do the job! The word would somehow have to pick out the quantity that plays the charge role "by its inner nature" rather than by any role it plays. But we have no conception of what that inner nature is, and it's not even clear what it would mean to have such a conception.

Admittedly, the assumption that (H') is necessary still entails a strong necessity, namely that some omniscient being exists. But that hole is easily plugged:

(H'') Either there is no omniscient being or some omniscient being likes charge.

If the two-dimensional account of a posteriori necessity is correct, (H'') cannot be necessary. But (H'') is not a strong necessity, nor does the necessity of (H'') entail any strong necessity. So banning strong necessities is not enough.

]]>Let's start with two extreme cases. First, suppose I know nothing at all about my own location and how it relates to you -- not even that I am not identical to you. The assumption that you believe that are strawberries "here" (as you put it) then only tells me that there are strawberries somewhere in the universe. If you are moderately rational, then this is an uncentred belief you have as well. So in this case, deferring to you amounts to deferring to your uncentred beliefs; your centred (self-locating) beliefs can be ignored. That is, I treat you as an expert just in case, for any uncentred proposition A,

\[ Cr_I(A / Cr_U(A)=x) = x. \]If I have further uncentred evidence that you may lack, I need to conditionalize your credence on that evidence, as usual: \[ Cr_I(A / Cr_U(A / E)=x) = x. \]

Now for the other extreme: I am certain about my location relative to yours. This is information you may or may not have, and it may be relevant to other things you believe. For example, if I know that I am 1 km behind you but you are uncertain whether I am 1 km or 2 km behind, then conditional on your beliefs I should not be uncertnain whether I am 1 km or 2 km behind.

So I should consider your beliefs conditional on the extra information I have. But the information is centred, so conditionalizing your credence on it yields the wrong results: I'm not interested in your credence conditional on the hypothesis that you are 1 km behind you. We need to adjust the content.

For any centred proposition A, let [+n]A be the proposition that A is true n km down the path. My evidence E tells me that E is true here, but to consider your beliefs in light of that evidence, I should consider your credence conditional not on E but on [-1]E: the hypothesis that E is true 1 km behind you.

We need the opposite adjustment for the content of your self-locating beliefs: knowing that you are 1 km ahead of me, and assuming you are certain that there are strawberries around you, I should become certain that there are strawberries 1 km ahead.

So we arrive at the following rule for cases where my evidence E entails that you are n km ahead: for any centred or uncentred proposition A,

\[ Cr_I([+n]A / Cr_U(A/[-n]E)=x) = x. \]Equivalently,

\[ Cr_I(A / Cr_U([-n]A/[-n]E)=x) = x. \]What about in-between cases, where my evidence is not completely silent on matters of self-location, but also doesn't fully settle our relative location? (Every real-life case falls into this category.)

Well, we can apply the previous rule to possible extensions of my evidence that would settle our relative location. To spell this out, let \(Cr_U=f\) be the proposition that f is your credence function, and let D=n be the proposition that you are n km ahead of me. By the law of total probability,

\[ Cr_I(A / Cr_U=f) = \sum_n Cr_I(A / Cr_U=f \land D=n) Cr(D=n). \]When computing \( Cr_I(A / Cr_U=f \land D=n)\) it's important that I conditionalize your credence function not only my (shifted) evidence E but also the (shifted) assumption that D=n. So, slightly generalizing the previous rule:

\[ Cr_I(A / Cr_U=f \land D=n \land E) = f([-n]A / [-n]D=n \land [-n]E). \]Plugging this into the law of total probability, we get the general rule we were looking for:

\[ Cr_I(A / Cr_U=f) = \sum_n f([-n]A / [-n]D=n \land [-n]E) Cr(D=n). \]This still isn't entirely general because it reduces the question of our relative location to the question how many kilometers you are ahead of me on some path. The fully general rule requires generalizing the [-n] operator and the distance propositions D=n.

]]>