CCZ consider the following scenario. A light is connected to two switches, A and B, each of which is either up or down. The light is on iff both switches are in the same position (both up or both down). In fact, both switches are up and the light is on. CCZ say that (1a) is true in this scenario while (1b) is false:
(1a) If switch A or switch B was down, the light would be off.
(1b) If switch A and switch B were not both up, the light would be off.
They back up these judgements by a poll on Mechanical Turk. Among a few hundred mechanical turkers, 69% judged (1a) true, but only 22% judged (1b) true.
CCZ go on to argue that the difference in meaning between (1a) and (1b) can be explained by combining the inquisitive semantics of Ciardelli, Groenendijk, and Roelofsen (2018) with Alonso-Ovalle (2006)'s model of conditionals.
My hunch is that the difference has a more pragmatic origin.
Note that the data is messy. If (1a) is true, why do only 69% of polled subjects say that it's true? If (1b) is not true, why do 22% say that it's true?
What's more, CCZ report that 21% of their subjects judge that (1c) is true:
(1c) If switch A and switch B were not both up, the light would be on.
These 21% seem to read 'not both up' as 'both not up'. As CCZ mention (p.315), 'not both up' can arguably have this reading if focus is on 'up'. On this reading, it's obvious that (1b) is false, and the antecedents of (1a) and (1b) are not equivalent. If we want to set aside this reading, we have to ignore roughly 21% of subjects, all of which judged (1b) false. Among the remaining subjects, 28% judged (1a) true. So the real spread is: 28% true for (1a) vs 69% for (1b).
What kind of pragmatic effect could explain this noisy difference?
Here's one idea. Perhaps the antecedent of (1a), but not that of (1b), tends to trigger an exclusivity implicature. By comparison, imagine I've checked on the state of the two switches many times over the past few weeks. Now I report either (2a) or (2b).
(2a) Every time I checked, switch A or switch B was down.
(2b) Every time I checked, switch A and switch B were not both up.
The first report indicates that I never found both switches down. The second does not.
If the antecedent of (1a) similarly picks out a set of worlds all or which are such that A or B is down, we might expect to find the same difference.
Here's another idea.
On familiar assumptions about conditionals, the status of (1a) and (1b) depends on whether worlds where both switches are down count as more "remote" than worlds where only one is down. (1a) is only true if they don't. (Otherwise any world where one of A and B is down and the lights are off is equally close as a world where both are down and the lights are on.)
Now the closeness standards are flexible. Context can suggest different resolutions. In particular, the wording of (1a) might suggest a resolution on which both-down worlds are more remote than one-down worlds, while the wording of (1b) might suggest the alternative resolution. That's because (1a) draws attention to the independence of the two switches: one can be down or the other can be down. (1b) instead draws attention to the joint position of the switches: whether they are both up. Varying the joint position gives us both-down just as easily as one-down.
That some such context-dependence is at work is supported by the fact that it's easy to hear (1a) as true and (1b) as false in isolation, but affirming (1a) and denying (1b) in one breath is weird.
(3) ? If switch A or switch B was down then the light would be off, but I don't think that if switch A and switch B were not both up then the light would be off.
(4) ?? The light would be off if switch A or switch B was down, but not if switch A and switch B were not both up.
Here's a familiar kind of scenario from the debate about higher-order evidence.
You have come to believe a complex logical truth P on the basis of some reasoning. Now you get evidence suggesting that the reasoning faculty you have employed is unreliable.
Christensen thinks that the evidence should reduce your confidence in P. I'm not sure about this, but I'm inclined to agree. Christensen also says something else that I don't think is true. He says that you should reduce your confidence in P even if you're ideally rational.
"I do not see how it could be ideally rational to be confident in P while thinking that one's reasoning to one's belief in P was likely unreliable."
This raises a (fairly obvious) puzzle. Whatever specific evidence you may have about your reasoning faculty, the evidence logically entails that your reasoning led to a correct conclusion when you derived P. That's because P is a logical truth and hence entailed by everything. But how could evidence which entails that your reasoning led to the correct conclusion call for a reduced credence in that very conclusion?
As an ideal agent, you could have evidence that your reasoning faculties are unreliable in general. But you could not have evidence that any particular instance of your reasoning led to a false conclusion, assuming that you still remember the conclusion. Otherwise it would follow that you have evidence for something whose negation is entailed by that evidence, and how could that be?
I'm not sure what Christensen thinks about this. I suspect he bites the bullet and accepts that your evidence can support a proposition even though it entails its negation. In Christensen (2007, 16ff.), he argues that evidence about the general unreliability of your reasoning faculty must cast doubt on any particular instance in which you use the faculty. If you could be rationally certain for each instance that it led to a correct conclusion, then you could become rationally confident that your reasoning faculty is reliable simply by using it over and over. He thinks that this sort of bootstrapping would be problematic.
All this makes some sense, I think, if you're a non-ideal agent. Suppose you have found a subtly faulty proof of ¬P, and you don't spot the mistake. Then your evidence entails P (and that your proof is mistaken), but you should be somewhat confident that P is false. The problem is that you don't realise what your evidence supports, because you can't see through all its consequences. Ideal agents don't have this problem.
I suspect that what's leading Christensen astray is the false assumption that an ideal agent would base their beliefs in logical truths on some kind of reasoning.
An ideally rational agent has probabilistic credences (I think). Probabilistic coherence implies that logical truths have credence 1. As a probabilistically coherent agent, you are certain of P not because you have gone through a proof, or because you have a "special way of seeing clearly and distinctly that occurs when [you] contemplate claims like [P]" (Christensen (2007, 19)). No, you are certain of P simply because you are coherent.
If your certainty in P is not the result of some cognitive process, then empirical evidence about the unreliability of your cognitive processes is obviously irrelevant to whether you should retain your certainty.
Suppose you nonetheless go through some a priori reasoning. You wouldn't do this in order to find out whether the conclusion is true, for you already know the answer to this question. But you might do it, for example, to find out whether your reasoning faculties are reliable – an empirical question whose answer you may not yet know. If, for example, you do a number of tableau proofs in your head, and you observe that they all lead to conclusions of which you already knew, independently of the proofs, that they were true, then you can reasonably infer that you are reliable at this kind of proof. There's nothing puzzling or problematic about this. It's not a kind of bootstrapping.
An ideally rational agent has no use for reasoning as a means of extending her knowledge. If something follows from what she knows, she already knows it; otherwise she wouldn't be ideally rational. That's why familiar models of ideal rationality have nothing to say about reasoning. Reasoning is a sign of cognitive imperfection.
I assume you remember the Sleeping Beauty problem. (If not, look it up: it's fun.) Wilhelm makes the following assumptions about Beauty's beliefs on Monday morning.
First, Beauty can't be sure that it is Monday:
(1) Cr(Mon) < 1.
Second, Beauty's credence in Heads conditional on Monday should be 1/2:
(2) Cr(H / Mon) = 1/2.
Together, these two assumptions are incompatible (by the probability calculus) with the "halfer" hypothesis (3):
(3) Cr(H) = 1/2.
Some halfers (Hawley (2013)) reject (1). Most reject (2). Wilhelm doesn't acknowledge that (1) and (2) are controversial. He says that violating (1) would be "obviously irrational", and that (2) "follows from some standard principles, endorsed throughout the literature" (both quotes on p.1901). This is clearly not correct because halfing is a consistent position, and most halfers would reject whatever "standard principles" Wilhelm has in mind.
But let's set this aside. Let's assume that (1) and (2) are correct. Wilhelm then sees a problem. Standard formulations of the Principal Principle, he says, entail (3). But (3) is false, since we're assuming (1) and (2)!
How does (3) follow from the Principal Principle? "The derivation of (3) from the Principal Principle is fully rigorous", Wilhelm says (on p.1900). Let's have a look. The derivation (in the appendix) rests on the stipulation that Beauty's Monday credence equals her initial credence conditionalized on the complete history of the world up to Monday morning. From this we get (3) by a formulation of the Principal Principle from Lewis (1980), according to which one's rational initial credence in a proposition A, conditional on information about the chance at some time t and the history of the world up to t, should equal that chance of A at t.
But why can we stipulate that Beauty's Monday credence equals her initial credence conditionalized on the complete history of the world up to Monday morning? Personally, I think this is definitely false, no matter how we construe the relevant history proposition. But I'm a halfer. Is the stipulation plausible from a thirder perspective?
Thirders tend to think that Beauty should proportion her credence to her evidence, so that Cr(H) equals Cr_{0}(H / E), where Cr_{0} is a rational prior and E is Beauty's total evidence. It probably won't affect the answer to the Sleeping Beauty problem if we assume that Beauty is omniscient about the past. So we can stipulate that E entails all truths about the history of the world up to Monday morning. Wilhelm needs the stronger assumption that Beauty's evidence E is equivalent to the full truth about the history up to Monday. But Beauty also has self-locating evidence – that she is awake, that she has no memories from later than Sunday, and so on. We can't adequately model these as uncentred propositions about history. Beauty's observation that she is awake is not equivalent to an observation that she is awake on Monday morning. If her self-locating evidence is entailed by her history evidence, we must construe the relevant history evidence as centred. E must say something like 'I am now awake, and I was asleep an hour ago, and Beauty is awake at 8am on Monday, and she is asleep at 7am on Monday, and so on', but it must not connect the centred and the uncentred information it contains: it must not say 'it is now 8am on Monday', or anything like that.
The question now is if this "centred history proposition" falls in the scope of Lewis's Principal Principle. If I were a thirder, I would say no. Thirders usually say that Beauty's credence in H should be less than the known chance 1/2 because she has inadmissible information – namely, that she is awake. This inadmissible information is part of the centred history proposition. We certainly can't assume that the Principal from Lewis (1980) holds with centred history propositions.
So I'm not convinced that there's a puzzle here. There is no good reason to think that (3) follows from the Principal Principle.
But let's look at Wilhelm's response to his puzzle. If (3) is false, and follows from the Principal Principle, should we reject the Principle?
Not quite. The puzzle, Wilhelm suggests, arises because H is a centred proposition. Standard formulations of the Principal Principle presuppose that the objects of chance are uncentred.
Recall that H is "the proposition that the coin lands heads" (p.1900). According to Wilhelm, "H is, of course, a centred proposition". It must be distinguished from "the uncentred proposition – call it 'U' – that the coin lands heads" (p.1903). The chance of U is 1/2, but the chance of H is 1/3. The "rigorous derivation" of (3) falsely assumed that the chance of H is 1/2.
Wilhelm goes on to defend the idea that chances can pertain to centred propositions like H. But let's pause here.
Why is H ("of course") centred? Both H and U are defined as "the proposition that the coin lands heads". Why is one centred and the other uncentred? A centred proposition can change its truth-value within a single (uncentred) world. How can "the coin lands heads" be true at some points and false at another, assuming that "the coin" refers to the single coin in the Sleeping Beauty scenario? I don't understand.
But here's how we could make H centred, using an idea from Titelbaum (2012) (who, if I may say so, has it from Schwarz (2015)).
Let's add a second coin toss to the Sleeping Beauty scenario. The second toss takes place on Tuesday and has no relevant consequences. We only introduce it so that Beauty can be sure that some coin will be tossed today, even though she's not sure whether it is Monday or Tuesday. More concretely, we can now ask how confident she is that today's coin toss will land heads. This is our centred version of H.
It'll be useful to have different labels for the different 'Heads' propositions. Let H_{Mon} be the uncentred proposition that the Monday coin lands heads, H_{Tue} the uncentred proposition that the Tuesday coin lands heads, and H_{Tod} the centred proposition that today's coin lands heads.
It is easy to show that in light of assumption (1), Cr(H_{Mon}) cannot equal Cr(H_{Tod}). If we re-interpret Wilhelm's centred 'H' as 'H_{Tod}' and his uncentred 'U' as H_{Mon}, his suggestion would be that H_{Mon} has chance 1/2 and H_{Tod} has chance 1/3, and that Beauty should align her credence with these chances. This doesn't work, however. The probability calculus demands that Cr(H_{Mon}) ≤ Cr(H_{Tod}). (See Mike's or my paper.) In fact, standard thirding entails that Cr(H_{Mon}) = 1/3 and Cr(H_{Tod}) = 1/2. We would need the opposite proposal, that H_{Mon} has chance 1/3 and H_{Tod} has chance 1/2. It's highly implausible, though, that H_{Mon} has chance 1/3.
So there's no real room for an alternative response here. Thirders should stick to their standard move and say that (3) doesn't follow from the Principal Principle because Beauty has inadmissible evidence.
But let's think some more about H_{Tod}, the proposition that today's coin lands heads. Imagine Beauty is shown the coin on Monday afternoon. "This is the coin we're going to toss today", she is told. "It is a fair coin. What's your credence that it will lands heads?". As a thirder, Beauty would say '1/2'. Most forms of halfing, however, require Cr(H_{Tod}) = 1/3. As a halfer, Beauty would say '1/3'. Titelbaum (2012) thinks this is an embarrassment for halfers, because Beauty should obviously say '1/2'.
Lando (2022) agrees. She argues that this raises a serious problem: standard accounts of self-locating credence can't explain why 1/2 is the right answer, because centred propositions like H_{Tod} don't have a chance.
As a halfer, I don't think Beauty should say '1/2' when asked about her credence in H_{Tod}. To get Lando's problem off the ground, we have to assume that I'm wrong.
I admit that naive intuition favours this assumption. "Here's a fair coin. What's your credence that it will land heads?" One would expect the right answer to be 1/2. Let's assume it is. Surely this has something to do with the chance of heads being 1/2. But we can't directly apply the Principal Principle to H_{Tod}, because H_{Tod} is centred and presumable only uncentred propositions have a chance. So how can we explain why Cr(H_{Tod}) = 1/2?
At this point, Wilhelm could jump in and offer his response: allow for centred chances and the problem goes away! Lando briefly considers this response, but rejects it as "not very plausible" (p.116). How, she asks, could there be a probability for things like 'it is Monday' that aren't generated by a chance process?
Wilhelm's answer is a modified Best-System Account. Even centred propositions can have relative frequencies. Why not say that the chances are the best summaries of these frequencies?
In fact, one might argue that only centred propositions have non-trivial relative frequencies. Non-trivial relative frequencies pertain to things that can be instantiated more than once within the same world, not to propositions that can only be true once-and-for-all or false-once-and-for-all. I've explored this idea, and its consequences, in Schwarz (2014). My proposal is rather different from Wilhelm's, and it doesn't give us a chance for things like 'it is Monday'. Wilhelm suggests that the chance of 'it is Monday' for Sleeping Beauty is 2/3, but I don't understand why. I would have thought that the relative frequency of 'it is Monday' is 1/7, since a seventh of all days are Mondays.
Anyway, let's return to Lando's problem. Let's assume that we don't want to assign a chance to propositions like H_{Tod}. Does this mean that we can't explain why Beauty's credence in H_{Tod} should be 1/2, based on facts about chance?
Lando considers only one possible explanation. It starts with the assumption that the Principal Principle constrains Beauty's credence in H_{Mon} and H_{Tue}, so that \(Cr(H_{Mon}) = 1/2\) and \(Cr(H_{Tue})=1/2\). Since Beauty's credence is divided between Monday and Tuesday, the putative explanation infers that her credence in today's coin landing heads must also be 1/2.
Lando rightly points out that this explanation is problematic. Thirders would not accept that \(Cr(H_{Mon})=1/2\). It's also unclear how \(Cr(H_{Tod}) = 1/2\) is meant to follow from \(Cr(H_{Mon})=1/2\) and \(Cr(H_{Tue})=1/2\), together with the assumption that Beauty's credence is divided between Monday and Tuesday. In fact, we can say something stronger: the premises entail the negation of the conclusion! If \(Cr(H_{Mon})=1/2\) and \(Cr(Mon) < 1\) then probability theory requires that \(Cr(H_{Tod}) > 1/2\).
Without considering any other explanations, Lando assumes that no explanation can be given for Cr(H_{Tod}) = 1/2 – at least not if we assume that credence is defined over centred worlds. We can only explain why Beauty's credence in H_{Tod} should be 1/2 if we recognize that H_{Tod} is associated with uncentred "truth conditions" – namely, that the Monday coin lands heads. Lando thus reaches the sweeping conclusion that no theory that construes doxastic content in terms of centred worlds "can adequately represent all of the rational constraints on our credences" (p.119).
There is no need to draw this sweeping conclusion. We can easily explain why Beauty's credence in H_{Tod} should be 1/2. In Schwarz (2015) I show that this follows from standard thirder assumptions. Let me give a more direct explanation.
We'll figure out to what extent Beauty's evidence on Monday morning supports H_{Tod}. Beauty is aware of the general setup, and her last memories are from Sunday. This means that one of the following eight (centred) possibilities must obtain.
H_{Mon}, H_{Tue} | H_{Mon}, T_{Tue} | T_{Mon}, H_{Tue} | T_{Mon}, T_{Tue} | |
---|---|---|---|---|
Mon | (a) | (b) | (c) | (d) |
Tue | (e) | (f) | (g) | (h) |
A priori, these are all equally probable. Here we invoke the Principal Principle to determine that each column has probability 1/4, and a principle of self-locating indifference that divides the probability of each column between its two cells.
Now Beauty also has the information that she is awake. This rules out cells (e) and (f). The remaining possibilities therefore have probability 1/6 each. It follows that \(Cr(H_{Mon}) = Cr((a) \lor (b)) = 1/3\) and \(Cr(H_{Tod}) = Cr((a) \lor (b) \lor (e) \lor (g)) = 1/2\).
I've always found this puzzling. Why would a decision maker be unable to assign probabilities (even vague or indeterminate ones) to the states? I don't think there are any such situations.
I haven't looked at the history of this distinction, but I suspect it comes from von Neumann, who (I suspect) had no concept of subjective probability. If the only relevant probabilities are objective, then of course it may happen that an agent can't make their choice depend on the probability of the states because these probabilities may not be known.
Anyway, while I don't think there are real situations in which a decision-maker can't assign probabilities to the states, I think it is nonetheless useful to study such situations. It is useful because it helps explain why we – and any minimally rational agent – can always assign probabilities to the states. The reason is that there is no good way to make decisions without. All rules for decision-making under ignorance are terrible.
For example, consider the most famous such rule: Maximin. It says to choose the option with the best worst-case outcome. Imagine, for example, that you and I are walking in a park, and we come across a discarded plastic bag. It looks like something might be inside. You know that I have no more information about the bag than you. I offer you a deal: if the bag contains a red tetrahedron with the letters 'R.H.S.' inscribed on it in green ink, you have to give me a penny, otherwise I will give you a million pounds. Maximin says that you should reject the offer. It's a terrible rule.
OK, but this just shows that Maximin is terrible. There are infinitely many rules for decision-making under ignorance. How do I know that all of them are terrible?
Good question.
Gustafsson (2023) provides some steps towards an answer. He shows that the following five conditions are unsatisfiable. (Maximin violates the third.)
Transitivity: If option x is at least as preferred as option y and y is at least as preferred as option z, then x is at least as preferred as z.
Expansion Consistency: Whether option x is at least as preferred as option y does not change if another option is added to the situation.
Strong Statewise Dominance: If the outcome of option x is at least as preferred as the outcome of option y in every possible state of nature and the outcome of x is preferred to the outcome of y in some possible state of nature, then x is preferred to y.
Pairwise State Symmetry: If x and y are the only available options and the outcome of x is just like the outcome of y except for a permutation of two states of nature, then x is equally preferred as y.
State-Individuation Invariance: Whether option x is at least as preferred as option y does not depend on whether a state of nature is split into two states.
The first three conditions are standard assumptions about rational choice. The last two pertain specifically to decision-making under ignorance. Pairwise State Symmetry, in particular, is obviously absurd if the agent has probabilities over the states. In the absence of any such probabilities, however, it looks plausible.
One might have thought that an adequate rule for decision-making under ignorance should lead to preferences that satisfy all five conditions. Gustafsson shows that this is not possible.
The force of this observation depends on the plausibility of the five conditions. Unfortunately, many of them have been contested. For example, it has been argued that risk-averse agents don't need to comply with Strong Statewise Dominance (Buchak (2013)), and that polite agents don't need to comply with Expansion Consistency (Sen (1993)).
We can strengthen Gustafsson's challenge by noting that his result doesn't depend on the agent's values. Consider a simplistic agent who doesn't care about risk or politeness. This agent only cares about their immediate degree of pleasure (say). One might think that in a case of "ignorance" the agent's preferences should satisfy all five conditions. And that's impossible, no matter which decision rule the agent uses.
Admittedly, this doesn't show that every rule for decision-making under ignorance is terrible. It's not obvious that giving up State-Individuation Invariance, for example, is always terrible.
In fact, giving up this condition is precisely what Gustafsson suggests. His starting point is LaPlace's Rule. The rule says that in a situation of ignorance one should give equal probability to each state of nature and then maximize expected utility. This violates State-Individuation Invariance. But that's OK, Gustafsson says, because we can describe a privileged partition of states to which the rule should be applied:
State S should be distinguished from S' iff there is a possible option for which it is permissible to prefer its outcome in S to its outcome in S'.
I've explained in this post why I think it's a bad idea to impose rational constraints on the individuation of outcomes, as Gustafsson here does. But let that pass. The important point is that one can try to identify a privileged partition for applying Laplace's indifference requirement. See, for example, Mikkelson (2004), White (2010), and Weisberg (2011) for proposals that don't involve constraints on the individuation of outcomes.
Depending on how we choose the partition, Laplace's Rule may or may not be terrible. I'm not sure if Gustafsson's version of the rule is coherent, but if it is then I suspect that it is fairly terrible.
To see why, imagine you have seen 1000 ravens, all of which were black. You now have to choose between deal A and deal B. Deal A gives you £1 if the next raven is black, deal B gives you £1 if it is not. The problem is that there are more ways for the next raven to be non-black than for it to be black (informally speaking). We have to treat all these ways as different because it is easy to imagine possible options that lead to relevantly different outcomes depending on the colour distribution over as yet unobserved ravens. Gustafsson's version of Laplace's Rule therefore suggests that you should choose deal B. Having seen 1000 black ravens you would, in effect, bet that the next raven isn't black. That's terrible advice.
(In fact, all the cardinalities involved are probably infinite and I'm not sure how Gustafsson's rule would then apply. That's why I said that I'm not sure it is coherent.)
But I must admit that other versions of Laplace's Rule seem to look better. Suppose we combine the Rule with Roger White's way of identifying the privileged partition. White (2010) suggests that two states should be given equal probability iff the evidence is neutral between them. White isn't interested in decision rules, so his "states" are just propositions. We're going to need something more fine-grained. But it's possible that this can be made to work.
For example, let's follow Lewis (1981) and identify outcomes with value-level propositions and states with dependency hypotheses. We may hope that each dependency hypothesis contains infinitely many fine-grained possible worlds, so that we can divide the dependency hypotheses until we have a partition over which the evidence is neutral.
If this works, the relevant form of Laplace's Rule looks OK. It's a non-terrible rule for decision-making under ignorance.
But of course it's not really a rule for decision-making under ignorance. The rule tells an agent who doesn't have degrees of belief that they should, first, adopt certain degrees of belief and then act in accordance with them.
I don't know how to properly delineate rules for decision-making under ignorance. When I say that all such rules are terrible, I don't have in mind rules like the Laplace-White rule that effectively introduce degrees of belief.