Counterexamples to Stalnaker's Thesis

I like a broadly Kratzerian account of conditionals. On this account, the function of if-clauses is to restrict the space of possibilities on which the rest of the sentence is evaluated. For example, in a sentence of the form 'the probability that if A then B is x', the if-clause restricts the space of possibilities to those where A is true; the probability of B relative to this restricted space is x iff the unrestricted conditional probability of B given A is x. This account therefore valides something that sounds exactly like "Stalnaker's Thesis" for indicative conditionals:

Thesis: P(if A then C) = P(C/A).

On the account I like, if you say 'P(if A then C)' in English, you almost inevitably end up saying something that denotes the conditional probability P(C/A), rather than the unconditional probability of some proposition expressed by 'if A then C'.

So it's interesting that Vann McGee and Stefan Kaufmann have found intuitive counterexamples to Stalnaker's Thesis. One of Kaufmann's examples in "Conditioning against the grain" goes as follows. There are two bags. In bag X, most balls are red, and most of the red balls have black spots. In bag Y, few balls are red, and few of those balls have black spots. You are 75% confident that the bag in front of you is bag Y. Now consider the statement:

(1) If you pick a red ball, it will not have black spots.

Many people apparently intuit (1) to have fairly high probability. I take that to mean that they would assent to

(1') Probably, if you pick a red ball, it will not have black spots.

This contradicts the Thesis, because getting a red ball is evidence that the bag in front of you is bag X, in which case it is rather likely that the ball has black spots.

As Kaufmann observes, if these facts are made salient -- if one points out that picking a red ball is much more likely if it's bag X rather than Y, and that most red balls in bag X have spots -- then people's intuitions switch and they deem (1) to have low probability. So it looks like the Thesis is right about some contexts, but not about others.

Kaufmann's explanation is that there are two ways of evaluating conditional probabilities, one "local" and one "global". Globally, 'P(if A then B)' denotes P(B/A); locally, 'P(if A then B)' denotes the expectation of P(B/A) relative to a certain parition, here the partition of bags { X, Y }:

(L) P(if A then B) = P(B/AX)P(X) + P(B/AY)P(Y).

The idea, which sounds plausible, is that when we judge (1) to be probable, we hold fixed that P(Y)=0.75 and note that P(No Spots / Red & Y) is high, which by (L) means that the probability of (1) is high.

But why would we use (L) to evaluate conditional probabilities? The "global" evaluation that conforms to Stalnaker's Thesis is predicted by the general Kratzer-style semantics of 'if' and 'probability'. Where does the "local" reading come from?

Kaufmann suggests that the two evaluations corresponds to different ways of supposing A, and also that the local evaluation can be understood as giving the expectated conditional chance of B given A, since chance is credence conditionalised on the true member of a relevant partition. Both of these remarks suggest that (L) could give the subjunctive conditional probability of B given A, P(A\B), rather than the indicative conditional probability P(A/B). Indeed, the kind of compartmentalised conditioning that figures in (L) is precisely what Lewis uses in "Causal Decision Theory" to define the imaging function for subjunctive conditional probabilities.

So maybe that's what's going on: when people judge (1) to be probable, they read the conditional as subjunctive. This isn't too implausible, I think, because in English the distinction between the subjunctive and indicative reading is usually only marked in the past tense. Read subjunctively, the intuitive judgement about (1) is correct, as can be seen if one enforces this reading by saying "if you were to pick a red ball, it would not have black spots".

The hypothesis that the subjunctive reading is in play might also be supported by the fact that the intuition about (1) becomes much weaker -- I think -- if the sentence is put into the past. Suppose you've drawn a ball but haven't looked at it yet. Consider:

(1'') If you picked a red ball, then it does not have black spots.

The hypothesis also fits the phenomenon that people's intuitions flip when it is pointed out that picking a red ball makes it more likely that it's bag X than bag Y: this context, where the topic is what is evidence for what, makes the indicative reading salient.

So far, so good. Unfortunately, the present story does not work for McGee's examples. Here is one Kaufmann discusses as well. Initially, you believe that Murdoch died in an accident. Then somebody who you think is probably Sherlock Holmes says that Murdoch was killed, that Brown is probably the murderer, and that in any case

(2) If Brown didn't kill Murdoch, then someone else did.

According to McGee, most people now regard (2) as highly probable. However, if it turns out that Brown didn't kill Murdoch, then you'd lose your confidence that the speaker is Holmes, and thus return to your judgment that Murdoch died in an accident. So the (indicative) conditional probability corresponding for (2) is low.

Kaufmann doesn't find this problematic, since it conforms to his local evaluation rule (L), this time using the partition { he's Holmes, he's not Holmes }. But this application of (L) cannot plausibly be taken to give the subjunctive conditional probability of someone else killing Murdoch given that Brown didn't kill him. The subjunctive probability is surely low. If you think that Brown probably killed Murdoch, you will not judge it very probable that if Brown hadn't killed him then someone else would have. Moreover, it is anyway implausible that people are reading (2) subjunctively, because it is in the past tense.

The reason why Kaufmann's rule (L) here doesn't yield subjunctive conditional probability is that it uses a bad partition { Holmes, not Holmes }. (This also makes it implausible to describe (L) as computing expected conditional chance.) Roughly speaking, the cells of a good partition would say enough about the the world and its causal structure so that, combined with either the assumption that Brown did kill Murdoch or that he didn't, each cell would entail whether someone else killed Murdoch. Applying (L) to such a partition yields a low conditional probability.

(L) is partition-dependent: the "local" probability of a conditional depends on the chosen partition. By choosing a suitable partition, we can let the local probability have almost any value we like. Kaufmann stresses that not all partitions are acceptable for (L), and that the right partitions must somehow encode the "causal structure of the scenario" [p.598]. But it isn't clear why this makes { Holmes, not Holmes } acceptable.

Let's redescribe Kaufmann's first example with a different partition. Again, you get to draw a ball from either bag X or bag Y; X contains mostly red balls with mostly black spots, Y has few red balls, few of which have black spots; based on your evidence, you are 75% certain that the bag in front of you is bag Y. If the contents of the bags are precisely specified (as Kaufmann does), it is possible to calculate your probability for the hypothesis that you draw a red ball from bag Y. Let this hypothesis be called RY. Given your evidence, the probability of RY is quite low, say 0.05. So you're very confident that not-RY is true. Moreover, if not-RY is indeed true and you draw a red ball, then the ball can only come from bag X, in which case it probably has black spots. Now consider

(1) If you pick a red ball, it will not have black spots.

I suspect many would judge (1) to have low probability in this context, lower than P(No Spots/Red) and much lower than the subjunctive P(No Spots\Red). But the scenario is exactly the same as Kaufmann's -- I've just made a different partition salient.

Here is one lesson we might draw. There aren't just two kinds of conditional probabilities, indicative and subjunctive, but infinitely many, one for each choice of a partition. Every partition induces an imaging function and thereby a type of subjunctive supposition. We could then also fold indicative conditional probability into the subjunctive kind, induced by the single-membered partition. Context usually determines which partition is salient for statements about conditional probability (i.e. for statements that look like statements about the probability of a conditional).

Maybe. But if that's true, I'd like it to follow from the general semantics of 'if' and 'probability'. Neither of these, by itself, seems to be sensitive to the contextually salient partition -- at least not to the extent required for the present proposal to work.

I prefer another, perhaps more obvious, explanation: people who intuit that (2) is probable and (1) very improbable (in the revised context) have made a mistake.

Where does the mistake come from? In part, it may come from the fact that the (standard, indicative) conditional probability is a bit hard to determine in these cases, because one has to keep track of two factors that pull in opposite directions. For example, in the case of (2), the hypothesis that Brown didn't kill Murdoch supports that someone else did it within the "Holmes" cell of the partition, but simultaneously lowers the probability of that cell and thereby the probability that Murdoch was killed.

More importantly, I think the mistake comes from the grammatical illusion that a question about the the probability of a conditional is a question about the probability of a certain proposition. If A is a proposition and { X, Y } a partition, then of course

P(A) = P(A/X)P(X) + P(A/Y)P(Y).

So we can always evaluate the probability of a proposition by considering its probability under different hypotheses and then take the weighted average. The result never depends on the chosen partition. When asked about the probability that if A then B, we mistakenly apply the same recipe, not realising that 'the probability that if A then B on the assumption that X' denotes P(B/AX) rather than something of the form P(A->B/X).

Consider another of McGee's examples. Quantum mechanics entails that

(3) If all atoms in this table decay within the next second, then Z amount of energy is released,

for some particular value Z. McGee suggests that if we trust quantum mechanics, then we will assign high probability to (3). However, P(Z released / table decays) is low, since finding the table suddenly decay would dramatically lower our confidence in quantum mechanics.

If the probability of (3) is the probability of a certain proposition that's entailed by quantum mechanics, then it is clear why trusting quantum mechanics requires assigning high probability to (3). But on the Kratzerian account, there is no such proposition, at least not if the conditional is read indicatively. (It could also be read as a nomologically strict conditional, in which case the failure of Stalnaker's Thesis is unproblematic.) On the indicative reading, there is no proposition that is (i) entailed by quantum mechanics and (ii) whose probability is in question when we ask about the probability of (3). Perhaps it is the prima facie plausibility that there is a proposition satisfying (i) and (ii) that explains why we mistakenly think the probability of (3) must be high, even on the indicative reading.

Comments

# on 22 April 2012, 09:15

Neat post, Wo. One interesting thing about the X/Y bags example in Kaufmann's paper is that it does not specify the probability of the following conjunction:

(C) the ball is white and the ball has a black dot.

It can be shown that there will be a negative correlation (viz., a disconfirmatory relation) between "the ball is red" and "the ball has a black dot" iff Pr(C) > 2/5. Perhaps subjects are somehow assuming/presupposing that Pr(C) > 2/5? If they are, then a "disconfirmation effect" could help to explain the responses. Douven has a nice paper on (dis)confirmation and the probabilities of conditionals. See:

http://www.springerlink.com/content/n22u58h119317106/fulltext.pdf

BTW -- Igor also has a short paper criticizing Kauffmann's account (on independent grounds -- of probabilistic inconsistency), here:

http://www.springerlink.com/content/l87u553916j6741m/fulltext.pdf

# on 22 April 2012, 11:39

Hi Branden, thanks for the pointers!

I read the scenario as saying that none of the white balls have black dots, but you're right that this isn't explicitly stated. Even if subjects think Pr(C) > 2/5, however, it seems implausible to me that the fact about confirmation would explain the intuition about the conditional. Usually, the mere fact that A *incrementally* confirms B doesn't make it plausible that if A then B.

The problem that Igor Douven points out in the second paper is the partition dependence. It might also be worth noting that Kaufmann's account is pretty much refuted by the triviality results: he claims to be safe by only needing certain instances of the conditional form P(A->B/C) = P(B/AC), but any argument to the effect that the equality P(A->B) = P(B/A) implies triviality also holds for probability functions P conditionalised on C. For example, applying Lewis's original result in this way presumably shows that P(A->B/C) = P(B/AC) implies P(A->B/C) = P(B/C).

# on 23 April 2012, 06:39

I don't usually follow this blog, but my colleague Fabrizio Cariani mentioned this threat to me, and since it is about my paper, I feel entitled to chime in. I'll try to be brief.

Re Holmes. I take it you agree that there is a high "local" conditional probability (given Holmes) that someone else killed Murdoch if Brown didn't. (Correct me if that's what you disagree with.) You claim that the corresponding subjunctive conditional doesn't have that probability. But I don't think there is exactly one (reading of the) corresponding subjunctive. There is a connection, but it's not that simple. If you are thinking of something like the analog Adams's Oswald/Kennedy example, I grant you that it's not so clear that you would endorse it based on what (the person you take to be) Holmes said. But there are also epistemic counterfactuals, along the lines of Hansson's hamburger example. I do think that under that interpretation the subjunctive gets a high probability given Holmes. Perhaps this reading is less salient - but then, why shouldn't a non-salient reading of the subjunctive be the one that's relevant for the indicative? And moreover, instead of "If Brown hadn't killed Murdoch, someone else would have," try "If it hadn't been Brown who killed Murdoch, it would have been someone else." I think the second one is not so unlikely given Holmes. (This works even for the Oswald/Kennedy example.) The upshot is: There are multiple ways to interpret the subjunctive, and perhaps multiple subjunctives to interpret. You haven't refuted my account by arguing that one reading does not correspond to the local interpretation of the indicative.

Re partition-dependence. Here's a general statement about my proposal: Partition-dependence is not a problem for it; partition-dependence is what it's all about! Perhaps I am partly to blame for not being more explicit in the paper about the origins of the partitions (I was aiming to be as semi-formal as the authors who had first proposed the examples). Here it is: The partitions are given by the values of the non-descendants in a causal Bayesian network. That's the intention, which I worked out (originally in my 2001 dissertation, but in more detail) in

Kaufmann 2005, "Conditional Predictions," Linguistics and Philosophy 28:181-231; and

Kaufmann 2009, "Conditionals right and left," Journal of Philosophical Logic 38:1-53.

Ok, now it should be obvious that relative to a fixed causal structure you get different partitions for different antecedents; therefore, even if we take the causal structure to be contextually given, it does not follow that any particular partition is. Furthermore, it also becomes clear that the "scenarios" discussed in this literatures are often vague and underspecified enough to be consistent with more than one causal structure. If that happens, then for a given antecedent you may get (different causal structures, hence) different partitions. A case in point is Douven's proposal to consider the case that bag X is really "two bags glued together" - a slightly ludicrous idea, but suitable to show just what I had in mind. Different causal structures give rise to different local probabilities. That was the whole point. It's an empirical question whether people, when given the original description of the scenario, consider the possibility that bag X is really two bags glued together. I wouldn't be surprised if they don't. By the way, it did not escape me when I wrote the paper (as it apparently did Douven when he read it) that what I call the "global" conditional probability is a special case of the "local" one, namely relative to a singleton partition.

Re triviality. You assert that "Kaufmann's account is pretty much refuted by the triviality results." I beg to differ. The whole paper is about certain conditional probabilities (and weighted sums thereof), and there is one section in which I observe that the fact that P(A -> C | C) = P(C) is not a problem. None of this adds up to a triviality problem. For that, I would have had to state somewhere what the truth conditions of the conditional are. I didn't do that in this paper. I did in other papers, in a way that I believe is not subject to the triviality results. Take a look at Kaufmann (2009) and let me know what you think.

Re error. Your best guess for what people do when they get local probabilities is that they are simply wrong about the conditional probability. Similar interpretations are offered by Douven, more recently by Egre and Cozic, etc. I'm sympathetic to this idea - it could be seen as basically a competence/performance problem - but I think it's wrong because local probabilities are useful for for an unrelated purpose: They help you get the right probabilities for compounds containing conditionals. Again, check the 2009 paper and let me know what you think.

All that said, thanks for your interest in my paper. I'll be happy to talk more about it.

# on 23 April 2012, 16:49

In my earlier post I meant to say P(A -> C|C) = 1, of course. Sorry, my bad.

# on 24 April 2012, 02:16

Hi Stefan, thanks for the comments! I haven't had a chance to read the other papers yet, so here are just a few preliminary remarks on the issues you raised.

- Epistemic counterfactuals: Good point -- I'm not sure what to think about epistemic counterfactuals. It's not enough, however, that there is a reading of "If Brown hadn't killed Murdoch, it would have been someone else" on which it is likely *given Holmes*. It would have to be likely simpliciter.

- Partition dependence: I agree that the partition dependence does not refute your proposal, and I hope I didn't create the impression that I think so. My worries were rather that (i) people's intuitions in the Holmes case and in my redescription of your X/Y scenario do not track local conditional probabilities relative to causal partitions, and that (ii) my favourite account of "if" and "probability" does not leave room for this extent of partition dependence.

- Triviality: I took it that you do assume that for some C, for all A and B, P(A->B/C) = P(B/AC). But this would make you vulnerable to various triviality results. Suppose for example that there are only three worlds in one of the relevant propositions C, with equal probability. One of the worlds is in A and ~B, the other in B and ~A, the third in A and B. Then P(B/AC) = 1/2, but there is no proposition whose probability conditional on C is 1/2. Hence P(A->B/C) cannot equal P(B/AC).

- Error. I wasn't clear enough on this point. I don't claim that people's "local probability" intuitions are simply wrong. In many cases I think they are right, because the conditional is read either subjunctively or as a strict conditional.

In the end, I don't think we disagree all that much -- especially if you agree that the local evaluation corresponds to a subjunctive reading of the conditionals. That's what I think as well, and although I'm a bit reluctant to allow non-causal, "epistemic" subjunctives, I don't have any principled objections to them. I don't think the triviality problem is serious, because we can state and motivate the local rule (L) without relying on the problematic equation P(A->B/C) = P(B/AC). Maybe our main disagreement is that I do think there is an important non-subjunctive, indicative reading of conditionals which only allows for the global evaluation. That is, I think the thesis that sounds like Stalnaker's Thesis is correct for all indicative conditionals A->B. This is why the alleged counterexamples generally look much weaker when put into the "indicative" past tense.

# on 24 April 2012, 05:10

First another correction: At the very beginning of my original post I referred to a 'threat' - I meant a thread, of course. Hey, come on, I wrote that late at night, having just gotten home from a performance of Bach's B-minor Mass, which you had made me sit through thinking about this stuff.

Holmes: I don't see why it's not sufficient for the conditional ('If Brown hadn't killed Murdoch, someone else would have') to be likely given Holmes. The scenario has it that you give this conditional a high probability just as long as you believe that the speaker is very likely Holmes. Once that belief is undermined, the probability of the conditional plummets.

Partition dependence: You state that your favorite account is "broadly Kratzerian," and in your last post you state that this view does not leave room for partition-dependence. Is that so? There are some recent authors (Egre and Cozic, Lassiter, Kratzer) who in some form or other play with the idea that a probabilified restriction account would automatically yield an interpretation of the conditional in terms of conditional probability. But they are assuming that the probabilistic analog of restriction is conditionalization. That's not required. Let it be "local conditionalization" within the cells of the causal partition (a bit like imaging). Then you get local probabilities.

Triviality: I see what you mean. You argue that there is no proposition whose probability is the requisite conditional probability. I agree. But I don't assume that conditionals denote propositions, at least not in the usual sense. The papers I pointed you to develop a many-valued assignment inspired by Jeffrey (1991) and Stalnaker and Jeffrey (1994).

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.