Answer: It depends on what else the hypothesis says. If, for example, the hypothesis says that 90 percent of all observers have three eyes, and also that we ourselves have two eyes, then the probability that we have three eyes conditional on the hypothesis is zero.

This effect is easy to miss because many hypotheses that appear to be just about the universe as a whole secretly contain special information about us. Consider the following passage from Carroll (2010), cited in Arntzenius and Dorr (2017):

Imagine we have two theories of the universe that are identical in every way, except that one predicts that an Earth-like planet orbiting the star Tau Ceti is home to a race of 10 trillion intelligent lizard beings, while the other theory predicts there are no intelligent beings of any kind in the Tau Ceti system. Most of us would say that we don’t currently have enough information to decide between these two theories. But if we are truly typical observers in the universe, the first theory strongly predicts that we are more likely to be lizards on the planet orbiting Tau Ceti, not humans here on Earth, just because there are so many more lizards than humans. But that prediction is not right, so we have apparently ruled out the existence of that many observers without collecting any data at all about what's actually going on in the Tau Ceti system.

I share Carroll's intuition. The fact that we aren't lizards is no reason to reject a theory according to which there are many lizards at Tau Ceti.

Carroll, Arntzenius, and Dorr assume that this intuition clashes with a principle of "anthropic reasoning" according to which we should regard ourselves as randomly sampled from the observers in the world. Less metaphorically, the principle says that conditional on the hypothesis that N percent of all observers in the world have property P, the prior probability that we have P is N percent.

In Carroll's example, P is the property of being a lizard. The hypothesis that there are 10 trillion intelligent lizards at Tau Ceti entails (let's assume) that a large proportion of observers in the world are lizards. The anthropic principle therefore seems to suggest that conditional on this hypothesis there is a high prior probability that we are lizards. Conditional on the alternative hypothesis (that there aren't any lizards at Tau Ceti) the probability is much lower. By Bayes' Theorem, it would follow that the fact that we are not lizards strongly supports the hypothesis that there aren't any lizards at Tau Ceti.

But the hypothesis that there are 10 trillion intelligent lizards at Tau Ceti doesn't just say that the proportion of lizards in the universe is high. It says more. In particular, it says where most of the lizards can be found – at Tau Ceti. And 'Tau Ceti', like all names, isn't a bare tag that simply picks out a certain star. It picks out the star under a certain mode of presentation, relating it to our own position in the universe.

Imagine we are looking into the night sky. I point at a distant star and say, "let's call this star 'Psi'". While I'm still pointing (awkwardly), we wonder whether there are trillions of intelligent lizards in the Psi system. This hypothesis is epistemically equivalent to the hypothesis that there are trillions of intelligent lizards on a planet orbiting the star that I'm pointing at. Ignoring the far-fetched possibility that we ourselves are in the system of the star at which I'm pointing, the hypothesis essentially says that there are lots of lizards *somewhere else*. And so the anthropic principle doesn't apply. Conditional on the assumption that most observers are lizards and that most of these lizards inhabit a planet far away from ours, it is not especially likely that we are lizards.

Arntzenius, Frank, and Cian Dorr. 2017. “Self-Locating Priors and Cosmological Measures.” In *The Philosophy of Cosmology*, edited by Khalil Chamcham, John Barrow, Simon Saunders, and Joe Silk.

Carroll, Sean. 2010. *From Eternity to Here: The Quest for the Ultimate Theory of Time*. New York: Dutton.

I claim that, in the absence of unusual evidence, a rational agent should be confident that observed patterns continue in the unobserved part of the world, that witnesses tell the truth, that rain experiences are a sign or rain, and so on. In short, they should give low credence to various skeptical scenarios. How low? Arguably, our epistemic norms don't fix a unique and precise answer.

Let's assume, then, that there is a range of evidential probability measures, each of which is eligible as a rational prior credence function.

All eligible priors give low credence to skeptical scenarios, but they don't agree on how low these credences are. As a result, agents who adopt different eligible priors will appear more or less cautious in the lessons they draw from inconclusive evidence.

Suppose we've seen 17 green birds and wonder whether the next bird will be green as well. If you give more prior credence than me to the (moderately skeptical) hypothesis that our initial sample was unrepresentative, then your credence in the next bird being green might be 0.8 while mine is 0.9.

Or suppose we've heard incriminating witness statements and wonder whether the defendant is guilty. Again, if you're epistemically more cautious than me by giving greater prior probability to scenarios in which witnesses are unreliable, your credence in the defendant's guilt might be 0.8 while mine is 0.9.

Why might you prefer a more cautious prior probability – or "inductive method", as Carnap would put it? Perhaps because you care more about the risk of inaccurate beliefs. You really want to avoid a low accuracy score, even at the cost of foregoing the possibility of a high score. I'm more risk-inclined. I value high accuracy more than I disvalue low accuracy.

There are different ways to model these attitudes, but a natural idea is to assume that we employ different scoring rules.

To illustrate this idea, consider two scoring rules. One is a simple absolute rule S_{a}, on which the inaccuracy of a credence function Cr at a world w is S_{a}(Cr,w) = 1-Cr(w). The other is a cubic rule S_{c} with S_{c}(Cr,w) = |1-Cr(w)|^{3}.

The cubic function doesn't care much about high accuracy. It regards a credence of 0.8 in the truth as very similar to a credence of 1 (because |1-0.8|^{3} is very close to |1-1|^{3}). However, it regards a credence of 0 in the truth as much worse than a credence of 0.2 (|1-0.2|^{3} is about half of |1-0|^{3}). By comparison, the absolute function cares less about low accuracy, and more about high accuracy.

This shows up if we look at the expected inaccuracy score of different eligible priors, from the perspective of other eligible priors.

The cubic score urges caution. If you start with a highly opinionated prior, it pushes you towards a less opinionated, more egalitarian prior.

The absolute score does the opposite. It pushes you towards a more opinionated, less egalitarian prior.

The fixed point for the absolute score is a maximally opinionated probability measure that assigns probability 0 to any skeptical hypothesis. This measure minimises expected S_{a}-inaccuracy relative to itself. The fixed point for the cubic score is the uniform measure that takes skeptical hypothesis as seriously as non-skeptical hypotheses.

Neither of these fixed points is rationally permissible, I think. Perhaps this shows that the two scoring rules I've used aren't rationally permissible either. Plausibly, for any permissible scoring rule there should be a permissible rational prior credence that minimises expected inaccuracy from the perspective of itself. This is the prior you should choose if you employ that scoring rule. Every other prior should drive you towards it.

In this model, we don't want permissible scoring rules to be proper. Proper scoring rules make every credence function ideal from its own perspective. But if you're really averse to getting things wrong then you should feel uneasy about assigning credence 0.01 to a skeptical scenario. By your own lights, you should think it would be better to move towards a more cautious state in which the skeptical scenario has higher credence.

(Since we're talking about scoring rules for the choice of priors, some familiar arguments for propriety don't apply.)

Unlike Richard's model, the model I have in mind only has one layer of permissivism. Once you've settled on a degree of epistemic risk-aversion, and thereby on a scoring rule, the range of eligible prior credence functions should ideally reduce to a single candidate. We could therefore allow your risk attitudes to change over time without allowing for dramatic and incomprehensible changes in belief. If you change your risk attitudes and adopt a new inductive method, applying it to your total evidence, you may become more or less cautious. But you will never switch from a high credence in H to a high credence in ¬H.

]]>2. E_P s(P) < P(A) E_{P_A} s(P_A) + P(A^c) E_{P_{A^c}} s(P_{A^c}) whenever 0<P(A)<1,

where s(P) is a function from worlds to [-infty,M] for some finite M, P_A is P conditioned on A, and E_P is expectation with respect to P.

Is there such a measure?

Strict propriety entails (2). One might guess that (2) is equivalent to strict propriety, but in fact (2) doesn't even entail propriety. [Let s0(P) be 1 if P is zero on some non-empty set and 0 otherwise. Then (2) holds with non-strict inequality for s0. Let s(P) = s0(P) + a Brier(P) for some small positive a. Then (2) holds with strict inequality for Brier. Hence (2) holds with strict inequality for s. But clearly s isn't proper, at least if a is small. For let P be uniform. Then E_P s0(P) will be zero, but if Q is zero on some non-empty set, E_P s0(Q) will be one, and for small a we will have E_P s(P) < E_P s(Q), contrary to propriety.]

One might throw additivity into the mix, but I am sceptical of additivity.]]>

(There is, of course, much more in the book than what I will summarise, including many interesting technical results and some insightful responses to anti-permissivist arguments.)

The central Jamesian idea is that what we should believe depends not just on our evidence but also on our attitude towards epistemic risk.

To understand what this could mean, let's first imagine that there is an all-or-nothing attitude of belief. Let's also assume that there is an evidential probability measure that tells us to what degree a proposition is supported by an agent's evidence. (Richard discusses this setup in chapter 3, drawing on work by Kelly, Easwaran, and Dorst.)

Assume, then, that your evidence supports P to degree 0.8. You want to believe truths and not believe falsehoods. Should you believe P?

If believing a truth has as much positive value for you as believing a falsehood has negative value – if, say, the former has utility 1 and the latter -1 – then the answer is yes: believing P maximises expected utility. But suppose you really care about not believing falsehoods, so that believing a falsehood has utility -10 while believing a truth has utility 1. Then it's better not to believe P.

The utilities here are not meant to represent your practical desires. We're not interested in what you should belief from a practical perspective, but in what you should belief from an *epistemic* perspective. We're talking about your *epistemic* utilities.

In general, the more you epistemically care about the risk of believing a falsehood, the more reluctant you will be to form beliefs in the presence of supporting evidence. In the limit, if you're extremely risk-averse, you should believe nothing (or only propositions that are entailed by the evidence). At the other extreme, if you don't care at all about the risk of believing falsehoods, you should believe every proposition whatsoever (with the possible exception of propositions that are incompatible with your evidence).

This not quite how Richard sets up the Jamesian model. Richard assumes that you have a choice between three attitudes: believing P, disbelieving P, and suspending judgement. Even if you don't care about believing falsehoods, you should then only believe a proposition if its evidential probability is greater than 1/2. I think the picture is nicer if we identify disbelief with believing the negation. But this isn't a serious disagreement.

Anyway, three quick critical comments before we move on to credence.

First, what exactly is your "epistemic utility function" supposed to be, and why is it relevant to what you should believe?

The only way I can make sense of this is through a revealed-preference approach. Let's not pretend that we already know what subjective epistemic utility functions are. Instead, we start with the idea that the norms of rationality are permissive. They don't settle whether you should believe a proposition given that it has such-and-such degree of evidential support. Some people believe the proposition on that basis, some don't, and either choice is OK. But it's not OK (let's assume) to believe P and disbelieve Q if Q is better supported than P. In general, there are certain constraints on how one's beliefs should relate to evidential support. If an agent satisfies these constraints then there is an epistemic utility function such that the agent believes a proposition iff believing the proposition maximises expected epistemic utility.

This makes sense to me. It would now be good to know what the relevant constraints are. Richard doesn't tell us. He seems to assume that we have a direct grip on the notion of personal epistemic utility.

Second, whatever the relevant constraints are, they probably won't look all that plausible. The model we've ended up with assumes a version of the Lockean Thesis, on which all-or-nothing belief is only a matter of whether the evidential probability exceeds a certain threshold. This has many well-known problematic consequences.

Third, what does any of this have to do with risk? Above I said that if you really care about the *risk* of believing a falsehood then you should form few beliefs. That's not actually what the Jamesian model says. The model says that you should form few beliefs iff you assign comparatively high disutility not to a risk, but simply to believing a falsehood.

In economics, it is common to model risk attitudes by "utility curves". For example, an agent is said to be risk-averse if they assign decreasing marginal utility to money. But that's an odd conception of risk-aversion. Intuitively, the fact that $1000 means more to you if you're poor than if you're rich doesn't indicate that you have a genuine preference against taking risks. As I argue in chapter 8 of my decision theory notes (and as others have argued before me), genuine risk-aversion should be modelled by assigning low utility to outcomes that were brought about through a risky choice.

Throughout his book, Richard ignores this way of modelling attitudes towards risk. Perhaps that's because of his "accuracy-first" approach on which the only epistemically relevant feature of a belief state is its degree of accuracy. But I'm not sure. Perhaps it's really because the phenomenon he's interested in just isn't what I would call genuine sensitivity to epistemic risk.

If we try to adapt the Jamesian model of all-or-nothing belief to partial belief (credence), we run into a problem.

We now need to ask about the evidential expected utility of certain credence functions, relative to the personal utility the agent assigns to different degrees of accuracy. The problem is that no matter what these personal utilities look like (within reason), the evidential probability function will plausibly assign maximal expected utility to itself.

Formally, this assumes that the eligible personal utility functions should be "strictly proper", and so Richard goes through some arguments for strict propriety in chapter 4, suggesting that only strictly proper utility functions satisfy such-and-such intuitive desiderata.

Personally, I find some of the supposed desiderata rather dubious. But the underlying philosophical point is arguably simpler than Richard makes it appear.

In a nutshell, the problem is this. If there is such a thing as an evidential probability measure, which tells us that some proposition P has probability x based on our total evidence, then it's hard to see how it could be rational – by the lights of that same evidential probability function – for us to assign to P any credence other than x.

So we must drop the assumption that there is an evidential probability measure. (Richard doesn't comment on this move.) We then can't evaluate the choice of a credence function by its evidentially expected epistemic utility. How else might we evaluate the choice?

Richard suggests that we should think of the situation as a decision problem "under uncertainty", where no probabilistic information is available. In chapter 7, he looks at some decision rules for decision-making under uncertainty, and argues that the best of them is the "Generalised Hurwicz Criterion" (GHC).

The GHC is an extension of the original Hurwicz Criterion, which in turn is an extension of the Maximin rule.

The Maximin rule says to choose an option with greatest worst-case utility. The Hurwicz Criterion looks at both the worst-case and the best-case utility. It recommends to maximise the weighted average of these two utilities, relative to some personal weights.

The GHC extends this by looking at all possible outcomes. We assume that you care to a certain degree about the best case. This is your first "Hurwicz weight" λ_{1}. You also care to some degree about the second-best case, giving your second Hurwicz weight λ_{2}. And so on, up to the worst case. An option's choiceworthiness is then the sum of the utility of all its possible outcomes weighted by their Hurwicz weights: the best outcome is weighted by λ_{1}, the second-best by λ_{2}, and so on.

(I found the discussion of these rules a little confusing. We are told to assume that there is a finite set of worlds that settle everything that matters to the agent. Options are represented as functions from worlds to real numbers, so that f(w) is the utility of f at w. But if w settles everything I care about, then nothing I could choose makes a difference to how good w is: doing f at w must be just as good as doing g at w. We should really understand the "worlds" here as Savage-type "states". These states are compatible with each available act, and they do not settle anything the agent ultimately cares about.)

Why should the GHC be the right way to evaluate options "under uncertainty"? Richard's argument is that the preference relation induced by GHC has some desirable properties. Let's just look at three relevant properties.

"Strong Dominance" says that if an option A is at least as good as B at all worlds (i.e., states), and better at some, then A is preferred to B.

This looks plausible. It rules out both Maximin and the simple Hurwicz Criterion, neither of which satisfies Strong Dominance. The GHC does.

Next we have "Strong Linearity". This says that if A ∼ B, then for any numbers m and k, mA+k ∼ mB+k. For example, if you're indifferent between an option A that scores 1 at w_{1} and 8 at w_{2} and an option B that scores 2 at both w_{1} and w_{2}, then you are also indifferent between these same options with all the outcome utilities multiplied by -1.

Strong Linearity is satisfied by Bayesian accounts on which you first assign a probability to the worlds and then determine choiceworthiness in terms of expected utility.

Richard argues that the condition is implausible if we want agents to care about risk: -8 is a really bad outcome, and you might well prefer the guaranteed -2 over a risk of getting -8. The GHC does not validate Strong Linearity.

No argument is given in support of the assumption that rational agents may "care about risk" in the described manner, where riskiness is not reflected in the outcomes.

Finally, there's "Permutation Indifference". This says that it doesn't matter to which worlds an option assigns its utilities. That is, if π is a permutation of W and if for all worlds w, A(w)=B(π(w)), then A is not preferred to B nor B to A.

Richard claims that this is "compelling". To me it looks insane. It means that all worlds are treated as equally relevant. You should care about the outcome of your choice in skeptical scenarios just as much as you should care about its outcome in sensible scenarios. I think rational agents should give less weight to skeptical scenarios, even if they don't have evidence that rules out these scenarios. I will return to this point below.

The GHC satisfies Permutation Indifference. Another rule that satisfies the condition is "Risk-weighted Objective Bayesianism", which assigns uniform credence to all worlds and then computes choiceworthiness in terms of risk-weighted expected utility, as in Buchak (2013). More generally, GHC and Risk-weighted Objective Bayesianism give the same verdict on all the conditions Richard looks at. It therefore remains unclear how these conditions are supposed to single out GHC as the best decision rule. Something must exclude Risk-weighted Objective Bayesianism, but we're not told what. (Or perhaps I've missed the explanation.)

I already mentioned that I'm not convinced by the case against Strong Linearity, and that I find Permutation Indifference highly implausible. So I'm not convinced by the case for GHC. The assumption that skeptical scenarios should be given less weight actually rules out *all* the rules that Richard considers.

I have other worries about GHC in particular.

One is that it violates a special case of a condition Richard calls "Coarse-Grain Indifference", according to which (roughly) if you don't care about the answer to a certain question then it doesn't matter if we individuate outcomes in such a way that they include the answer to that question or not.

Another aspect of GHC that looks odd to me is that the Hurwicz weights don't take into account how good or bad the relevant cases are. Compare two choices. In the first, one option gives you either $10K or $900K while the other option gives you either $-10K or $1M. You prefer the first, because you don't want to take the risk of losing $10K. In the second choice, one option gives you either $10K or $12K while the other option gives you either $8K or $20K. There's no risk of losing anything, and the difference between $10K and $8K isn't great, so you prefer the second option. There are no generalised Hurwicz weights that rationalise these attitudes, assuming dollars are a measure of utility.

Let's return to the Jamesian idea that our credences reflect not just our evidence but also our personal attitudes towards epistemic risk.

In the model for all-or-nothing belief, we implemented this idea by rescaling the basic epistemic utility of a true belief (+1) and a false belief (-1) in accordance with two personal scaling factors, measuring the extent to which the agent cares about true vs false beliefs.

We can now do something similar for credence, by adopting the Generalised Hurwicz Criterion. The basic epistemic utility of a belief state is an accuracy score. We don't assume a fixed scaling factor for specific degrees of accuracy. Rather, we assume a fixed scaling factor for the worst possible accuracy the state could have, whatever that might be. Similarly for the second-worst accuracy score, and so on. Since we've dropped the idea of an evidential probability measure, we'll determine the total value for each credence function by the sum of these scaled accuracy scores.

The relevant Hurwicz weights are meant to represent the agent's personal attitude towards epistemic risk.

How does the agent's evidence enter the picture? We could apply the GHC at each point in time, considering only credence functions that take into account the present evidence. But this raises some problems, both technical and philosophical. Richard instead suggests that the GHC should only be used once to determine an agent's *prior* credence, before they receive any evidence. The later credence should then come from the prior credence simply by conditionalising on the evidence.

As a result, your credences at any point in time only take into account your risk attitudes at the very start of your epistemic journey. It's not clear to me why we should assume this. One could instead adopt a form of "ur-prior conditionalisation", as in Meacham (2016), on which we evaluate your credences at any point in time by applying the GHC to the choice of a prior probability at that time and then require you to adopt a GHC-optimal credence function conditionalised on your total evidence.

In chapter 8, Richard explains how the choice of Hurwicz weights affects the rationally eligible priors.

If you give great weight to bad outcomes (low accuracy) then GHC says you should adopt a uniform prior. If you give great weight to good outcomes (high accuracy) then GHC says you should adopt a prior that matches any permutation of your Hurwicz weights. Without further restrictions on the Hurwicz weights, every probability function is in principle permitted as a rational prior.

We have permissivism due to different attitudes towards epistemic risk, as desired. And we got a second kind of permissivism on top: If you're risk-inclined, there will be many eligible prior credence functions, with no rational grounds for choosing between them.

This might point at a reason against my suggested form of ur-prior conditionalisation. If you can choose new priors at each point in time, and you're epistemically risk-inclined, then Richard's model would allow your credences to fluctuate wildly even without relevant evidence. Richard discuss such fluctuations in chapter 10 and seems to agree that they would be problematic.

As with the scaling factors in the Jamesian model for all-or-nothing belief, I'd say that the Hurwicz weights in the model for credence don't really represent an attitude towards risk. In principle, you might put high value on risky actions but also believe that you're an unlucky person so that whenever you choose a risky action you can expect to get the worst possible outcome.

Also, as before, I'm not really sure what an agent's epistemic utility function is supposed to represent. In the model for all-or-nothing belief, it represented a certain combination of objective alethic status (truth/falsity) and the extent to which the agent cares about true or false beliefs. In the new model, one might think these two components have been separated: epistemic utility is simply a measure of objective accuracy; the extent to which you care about high or low accuracy is represented by your Hurwicz weights.

But that's not actually Richard's picture. In chapter 9, Richard endorses a proposal from Gallow (2019) on which learning involves a change of epistemic utility. You should conditionalise on you evidence, says Richard, because your epistemic utility function no longer cares about accuracy at worlds that are incompatible with the evidence. Epistemic utility therefore doesn't simply measure distance to truth. (We seem to have given up on veritism.) Like the Hurwicz weights, it is a subjective attitude that varies from person to person and from time to time. I have no direct grip on this supposed attitude.

I now come to what I think is the biggest flaw in Richard's model.

Imagine you are stranded alone on a remote island. As you walk around the island, you occasionally see a bird. The first 17 birds you see are all green. This should make you somewhat confident that the next bird will also be green. There are, of course, worlds where the first 17 birds are green and the next one is blue. And there are worlds where it is yellow. Or red. Or white. You should give *some* credence to these possibilities. They are not ruled out by your evidence. But you should give higher credence to worlds in which the next bird is green.

If some worlds compatible with your evidence have higher credence than others, and your credence comes from a prior credence function by conditionalising on your evidence, then you must have given higher prior credence to some worlds than to others.

The example illustrates that we can only learn by induction if we give higher prior credence to "regular" worlds in which patterns that begin with GGGGGGGGGGGGGGGGG continue with G than to "irregular" worlds in which they continue with B, Y, R or W.

Similarly inegalitarian attitudes are required for other aspects of rationality. Rational agents can learn about the world around them through sensory experience. Under normal circumstances, the kind of experience we have when we walk in the rain should make us believe that it is raining and not, say, that Russia has invaded Mongolia. We should give high prior credence to worlds where this kind of experience goes along with rainy weather and not to worlds where it goes along with Russian invasions.

On Richard's model, epistemically risk-averse agents must adopt a uniform prior. They will be radical skeptics, incapable of learning about the world beyond their immediate evidence.

Epistemically risk-seeking agents can adopt sensible priors. But they may equally adopt arbitrary permutations of these priors. They may choose priors on which observing 17 green birds makes it highly likely that the next bird is blue, and on which an ordinary rain experience makes it likely that Russia invaded Mongolia.

Richard discusses a small aspect of this problem in chapter 11. Here he considers a scenario in which you know that a certain urn contains either 1 green ball and 3 purple balls (H1) or 3 purple and 1 green ball (H2). Now two balls are drawn with replacement. The possible outcomes are G1-G2, G1-P2, P2-G1, and P1-P2. As Richard points out, if you assign uniform prior credence to the eight combinations of { H1, H2 } with these outcomes, then getting a green ball on the first draw (G1) will not affect your credence in either H1 or G2. That seems wrong.

Richard notes that the problem could be fixed by demanding that your priors should satisfy the Principal Principle. This would imply that Cr(G1/H1) = 1/4 and Cr(G1/H2) = 1/4. More generally, the Principal Principle would settle the rest of your credences once you have assigned credences to H1 and H2.

Some (fairly risk-seeking) Hurwicz weights allow you to adopt a prior that satisfies the Principal Principle. Richard considers the possibility of declaring the other Hurwicz weights irrational, but he doesn't commit to the idea. It would hardly help anyway. The relevant Hurwicz weights would *allow* an observation of G1 to increase your credence in H2, as it should. But the same weights would allow many other credence functions that don't satisfy the Principal Principle, including credence functions for which observing G1 actually *decreases* the probability of H2.

Another response Richard considers is to replace the GHC by a "generalised chance Hurwicz criterion" GCHC which would ensure that any eligible prior satisfies the Principal Principle. This looks somewhat better. But it still doesn't go far enough.

For one thing, the problem of rational learning from experience doesn't just arise in cases where there are well-defined objective chances.

Moreover, even in the urn case the chance-based response only works if we assume that there are only two candidate chance functions: one according to which there's a 25% chance of getting a green ball on each draw, independent of the other draws, and another one according to which that chance is 75%. But why are these the only a priori possibilities? What about chance functions that don't treat the individual draws as i.i.d.? If such chance functions are on the table then you may satisfy the Principal Principle and still take observation of a green ball to be strong evidence that most of the balls in the urn are purple.

It's often useful to distinguish between *structural* and *substantive* norms of rationality. Internal consistency and coherence are structural demands. That rain experiences should be treated as evidence for rain is a substantive demand, as is the norm that 17 green birds should be taken to indicate the presence of further green birds.

Epistemic utility theory has proved useful in clarifying and perhaps justifying structural norms of rationality. And epistemic utility theory is Richard's preferred tool of work, here and elsewhere. It's no surprise, then, that the account we get is blind to the demands of substantive rationality. But that's a problem. Considerations of epistemic utility do not "determine the rationality of doxastic attitudes" (p.9).

Buchak, Lara. 2013. *Risk and Rationality*. Oxford: Oxford University Press.

Gallow, J Dmitri. 2019. “Learning and Value Change.” *Philosopher’s Imprint* 19: 1–22.

James, William. 1897. “The Will to Believe.” In *The Will to Believe, and Other Essays in Popular Philosophy*. New Tork: Longmans Green.

Meacham, Christopher J. G. 2016. “Ur-Priors, Conditionalization, and Ur-Prior Conditionalization.” *Ergo* 3 (20170208). doi.org/10.3998/ergo.12405314.0003.017.

Pettigrew, Richard. 2021. *Epistemic Risk and the Demands of Rationality*. Oxford: Oxford University Press.

I don't think the intuition that learning a true proposition improves one's epistemic state survives once we think about (a) misleading evidence and (b) the fact that some propositions are epistemically much more important than others, independently of any formal framework for measuring epistemic utilities.

Let's say that a billion people have received a medication for a dangerous disease over the last year and another billion have received a placebo. I randomly choose a sample of a hundred thousand from each group. Let E be the proposition that in my random sample of the medicated, each person died within a week of reception, and in my random sample of the placeboed, no person died within a week of reception.

Suppose that in fact the drug is safe and highly beneficial, but nonetheless E is true. (Back of envelope calculation says that in any given week, given a billion people, about two hundred thousand will die. So it is nomically possible that the first random sample will consist of only those who die a week after receiving the drug, no matter how safe the drug, and it is nomically possible that the second random sample won't contain anyone who died within a week of the placebo.)

After updating on the truth E, I will rationally believe that the drug is extremely deadly. Result: I am worse off epistemically, because getting right whether the drug is safe is more important than getting right the particular facts reported in E.

The obvious thing about this case is that it is astronomically unlikely that E would be true. The *expected* epistemic value of learning about the death numbers in the drug and placebo samples is positive, and that's what a proper scoring rule yields. But on rare occasions things go badly.

Of course, in my example above, the implicit scoring rule doesn't have uniform weights across propositions like in your examples. But scoring rules with uniform weights seem really unrealistic. In scientific cases, I take it that normally, getting right the particular data gathered from an experiment has much lower epistemic value than getting right the theories that the data is supposed to bear on. That's why years later the details of the data are largely forgotten but the theories live on. And sometimes they live on on false pretences, because the data was misleading. (And sometimes the data was false, of course.) ]]>

Before I discuss the data, here's a reminder of some differences between epistemic modals and non-epistemic ("metaphysical") modals.

First, and most obviously, the two kinds of modals tend to quantify over different scenarios. Epistemic modals quantify over scenarios that are compatible with the evidence. Non-epistemic modals typically don't.

Second, non-epistemic modals allow for non-trivial back-reference to the actual world, while epistemic modals don't. "I could have been taller than I actually am" makes sense, "I might be taller than I actually am" does not.

Third, indexicals, names, and natural kind terms (and possibly other expressions as well) are interpreted differently when embedded under the two kinds of modals. "I could have been somewhere else now" is fine, "I might be somewhere else now" is not, even if I don't know where I am. "Hesperus and Phosphorus might be different planets" is true in a skeptical context in which we consider that the science textbooks might be full of elaborate fabrications. "Hesperus and Phosphorus could have been different planets" is false.

The problematic data now suggest that there are hybrid modals that behave like epistemic modals in some respects but like non-epistemic modals in others.

To begin, consider the following "epistemic counterfactual" from Edgington (2008).

(1) [We knew that the treasure was either in the attic or the garden. So:] If it hadn’t been in the attic, it would have been in the garden.

Here we evaluate the consequent in worlds that resemble the actual world with respect to what we *knew* at around the antecedent time. The domain of quantification (the "modal base") therefore appears to be epistemic.

On the other hand, we can easily refer back to the actual world in this kind of environment. For example, we might say that "if the treasure hadn't been in the attic then it would not have been where it actually was".

Finally, singular terms in conditionals like (1) do not appear to have the rigid interpretation they normally have under non-epistemic modals. Rather, they appear to undergo the kind of "epistemic shift" that characterises epistemic modals. This can't be seen directly in (1), but consider (2) and (3).

(2) [We knew that Hesperus was either Venus or Jupiter. So:] If Hesperus hadn't been Venus, it would have been Jupiter.

(3) If that [pointing at a suddenly appearing gazelle] had been a tiger, we would be dead now. Vetter (2016)

Here we don't seem to hold fixed the actual reference of 'Hesperus' and 'that' when we evaluate them in the relevant counterfactual scenarios. Intuitively, (3) does not talk about impossible worlds in which a gazelle is a tiger. It rather talks about worlds in which a tiger suddenly appeared where in fact the gazelle appeared.

Mackay infers that there is something wrong with the "two-dimensional" approach to modals. Two-dimensional accounts assume that sentences can be evaluated in two ways relative to a possible world, depending on whether the world is considered "as actual" or "as counterfactual". This is meant to explain the observed differences between the two kinds of modals.

Mackay's own account instead draws on recent "variabilist" treatments of singular terms. On standard variabilist accounts, the shift of names and indexicals under epistemic modals is captured by assuming that epistemic modals don't just shift a world parameter – as non-metaphysical modals do – but also the assignment function that interprets the names and indexicals. (We ignore kind terms.) Mackay suggests that all modals shift both a world and an assignment parameter. In effect, the worlds in Kratzer-type accounts of modals are replaced by world-assignment pairs. When we evaluate conditionals like (1)-(3), context selects an accessibility relation that shifts both the world and the assignment function based on epistemic criteria.

What about the interpretation of 'actually', which suggests that (1)-(3) are interpreted "as counterfactual"? Here things get a little complicated. In a nutshell, Mackay assumes that the logical form of modal sentences contains explicit variables for world-assignment pairs. Modals selectively bind world-assignment variables with which they are co-indexed. Material involving 'actually' simply has a different index. As a result, the relevant variables are free and get interpreted in the unshifted actual world.

On this account, the three features mentioned above are in principle independent. Context may supply a domain of worlds that is epistemic or non-epistemic. Either way, it may determine that singular terms are interpreted rigidly or non-rigidly. (These two choices combine to determine the revised modal base whose elements are world-assignment pairs.) And no matter how these choices are made, one can in principle always undo a modal shift with 'actually' type constructions whose index is not bound by the modal quantifier.

One problem with this proposal is that it seems to overgeneralise. If the three distinctions are independent, one would expect to find all eight combinations. But many of them are really hard to find. Why do almost all modal constructions fall into just two of the eight classes?

Mackay holds that conditionals like (1)-(3), at least, are an exception. The modals here quantify over epistemically accessible worlds, singular terms undergo epistemic shift, and yet we can non-trivially refer back to the actual world.

But is that true?

Let's begin with the first issue. What are the worlds to which (1)-(3) direct us? Standard epistemic modals direct us to worlds compatible with our evidence. None of (1)-(3) do that. When we utter (1), we know that the treasure is in the attic. When we utter (3), we know that the relevant animal is not a tiger. Worlds where a tiger suddenly appeared are not epistemically accessible. Perhaps they were accessible relative to our earlier evidence. But even that may be false. (3) can be true even if we knew all along that there are no tigers around.

So I don't think it's true that "epistemic conditionals" like (1)-(3) have an epistemic modal base. They direct us towards scenarios of which we know that they don't obtain. What's special about at least (1) and (2) is how the counterfactual scenarios are selected. In general, the domain of non-epistemic modals is selected by holding fixed some actual circumstances, perhaps determined by a "modal anchor". Which circumstances are held fixed is highly context-dependent. In (1) and (2), the modal anchor happens to be our earlier state of knowledge. As I mentioned above, the conditionals direct us to worlds that resemble the actual world with respect to the facts that we knew before we made the relevant discoveries. That doesn't turn the conditionals into epistemic modals. We're still holding fixed salient circumstances.

What about the second issue? Do "epistemic conditionals" involve an epistemic shift of singular terms? Again, the answer is no. There are strong constraints on how the reference of singular terms normally shifts under epistemic modals. 'I' always refers to the individual at the centre of the scenario under consideration, 'you' (singular) refers to an individual with which the individual at the centre is currently engaging, demonstrative 'that' refers to whatever individual is pointed at in the scenario, and so on. If (3) involves this kind of shift then the worlds we should be considering are worlds at which I (or rather, the individual at the centre) am pointing at a tiger. But I would hardly be pointing at a tiger if I were dead. So (3) would come out as false.

Or consider a variant of (1): 'If the treasure hadn't been in the attic, we would never have met you/Frank/this guy'. In the consequent of the conditional, 'you' clearly refers to the person I'm *actually* talking to, not to whomever I might be talking to in the counterfactual scenario.

In sum, we don't see the kind of shift that characterises epistemic modals.

But we also don't see the kind of rigid interpretation that is often thought to characterise non-epistemic modals. So what's going on with these terms?

Counterpart theory provides an attractive answer. According to counterpart theory, even non-epistemic modals shift the reference of singular terms. The reference is shifted to a counterpart of their original referent. Counterparthood is a flexible and context-sensitive matter. That's why we can make sense of both 'if a billion tons of sand were dropped onto Mt Everest then Mt Everest would be submerged' and 'if a billion tons of sand were dropped onto Mt Everest then Mt Everest would be even taller'. In the first case, we treat the submerged mountain consisting of the original Mt Everest material as a counterpart of Mt Everest. In the second, the larger structure created by adding the sand is treated as a counterpart. Famously, counterpart theory also makes sense of contingent identity and distinctness. "If the two islands had merged they would have been a single island".

On this background, it is not hard to see what is going on with (2) and (3). Here we simply have an unusual counterpart relation. As Mackay himself points out, the reference of the relevant terms appears to shift by the acquaintance-type counterpart relations discussed in Lewis (1983). In (3), for example, the relevant counterparts of the gazelle are individuals that stand to us in a similar acquaintance relation as the gazelle does in the actual world: We just saw them suddenly appear over there.

I hadn't realised that such counterpart relations are available for non-epistemic modals. And they only appear to be available under very specific conditions. Imagine we're in the context of (3) and I utter, "if a tiger had just appeared then that [pointing at the gazelle] would have killed us". I can't hear a reading of this on which 'that' denotes the counterfactual tiger. Similar puzzles arise in counterpart-theoretic interpretations of attitude ascriptions. I don't fully understand the mechanisms at work here.

But I don't see a good reason here to give up the classical 2D picture of how modals pattern into two distinct classes.

Edgington, Dorothy. 2008. “Counterfactuals.” *Proceedings of the Aristotelian Society* 108: 1–21.

Lewis, David. 1983. “Individuation by Acquaintance and by Stipulation.” *The Philosophical Review* 92: 3–32.

Mackay, John. 2022. “Counterfactual Epistemic Scenarios.” *Noûs*, no. n/a. doi.org/10.1111/nous.12403.

Vetter, Barbara. 2016. “Williamsonian Modal Epistemology, Possibility-Based.” *Canadian Journal of Philosophy* 46 (4-5): 766–95. doi.org/10.1080/00455091.2016.1170652.

The problem is that conditionalising on a true proposition typically increases the probability of true propositions as well as false propositions. If we measure the inaccuracy of a credence function by adding up an inaccuracy score for each proposition, the net effect is sensitive to how exactly that score is computed.

Here is a toy example, adapted from (Fallis and Lewis 2016), where this point has first been made, as far as I can tell.

Suppose there are only three worlds, w_{1}, w_{2}, and w_{3}, with credence 0.1, 0.4, and 0.5, respectively. w_{1} is the actual world. Suppose we measure the inaccuracy of this credence function with respect to any proposition A by |Cr(A)-w_{1}(A)|^{2}, so that the total inaccuracy of the credence function is ∑_{A} |Cr(A)-w_{1}(A)|^{2}. (Here w_{1}(A) is the truth-value of A at w_{1}.) As you can check, the inaccuracy of the credence function is then 2.44.

Now conditionalise the credence function on { w_{1}, w_{2} }, so that the new credence function assigns 0.2 to w_{1} and 0.8 to w_{2}. Note that { w_{1} } is true but { w_{2} } is false, and the credence in the false proposition increased a lot from 0.4 to 0.8. If you add up the inaccuracy scores for the 6 non-trivial propositions, you now get 2.56. Learning a true proposition has increased the inaccuracy of the credence function from 2.44 to 2.56.

There are other ways of measuring inaccuracy. For example, we could use the absolute distance |Cr(A)-w_{1}(A)| instead of the squared distance |Cr(A)-w_{1}(A)|^{2}. I *think* this would get around the problem. (It certainly does in the example.) More simply, we could measure the inaccuracy of your credence function in terms of the credence you assign to the actual world: the lower that credence, the higher the inaccuracy. Then it's trivial that conditionalising on a true proposition never increases inaccuracy.

However, as Lewis and Fallis (2021) point out (with respect to the second alternative measure, but the point also holds for the first), these measures can't be used to justify probabilism. The absolute measure isn't "proper". And any measure that only looks at individual worlds can't tell apart a probabilistic credence function from a non-probabilistic function that assigns the same values to all worlds. Friends of accuracy-based epistemology therefore won't like the alternative measures. It looks like they have to accept the strange conclusion that learning true propositions sometimes (in fact, often) decreases the accuracy of one's belief state.

There might be a way around this conclusion. In the example, we assumed that w_{2} has greater prior probability than w_{1}. This is not a coincidence. If the worlds that are compatible with the evidence have equal prior probability then I *think* conditionalising never increases inaccuracy, under some plausible assumptions about the inaccuracy measure. (Which assumptions? I don't know.) If that is right, we could avoid the problem by stipulating that rational priors should be uniform.

Lewis and Fallis mention this response in (Lewis and Fallis 2021) (without proving that it would actually work), but reply that even if you start out with a uniform prior you could end up with an intermediate credence function that is skewed towards non-actual possibilities, from where the problem can again arise.

But if you only ever change your beliefs by conditionalisation then the worlds compatible with your total history of evidence will always have equal probability. Your intermediate credence function can't be skewed towards non-actual possibilities.

Lewis and Fallis, in effect, intuit that conditionalisation should decrease inaccuracy relative to any coarse-graining of the agent's probability space. Their "Elimination" requirement says that if { …X_{i}… } is any partition of propositions and we compute credal inaccuracy by summing only over the propositions in this partition (or in the algebra generated by the partition) then conditionalising on the negation of one member of the partition should decrease inaccuracy. I don't find this requirement especially appealing. Anyway, I'm interested in the effect of conditionalisation on the accuracy of the agent's entire credence function, where no propositions are ignored.

So I think the "uniform prior" response would work. The problem is that rational priors should not be uniform – on any sensible way of parameterizing logical space. Uniform priors are the high road to skepticism. A rational credence function should favour worlds where random samples are representative, where experiences as of a red cube correlate with the presence of a red cube, where best explanations tend to be true, and so on. An agent whose priors aren't biased in these ways will not be able to learn from experience in the way rational agents can.

So the problem remains. I'm somewhat inclined to agree with Lewis and Fallis that this reveals a flaw with the popular inaccuracy measures, and therefore with popular accuracy-based arguments. From a veritist perspective, conditionalising on a true proposition surely makes a credence function better. A measure of accuracy on which the credence function has become less accurate therefore doesn't capture the veritist sense of epistemic betterness.

One more thought on this.

If I'm right about the shape of rational priors, and if an agent only ever changes their mind by conditionalisation, then conditionalising only decreases accuracy (relative to the popular measures) if the actual world is a skeptical scenario. In the example, the actual world w_{1} has lower prior probability than w_{2}. If the agent only ever changed their mind by conditionalisation then w_{1} has lower ultimate prior probability. And I claim that w_{1} should have lower ultimate prior probability than some other world only w_{1} is a skeptical scenario. It needn't be a radical skeptical scenario, but it should have more of a skeptical flavour than w_{2}.

So even if we hold on to classical measures of accuracy, we can at least say that if the world is as we should mostly think it is, then conditionalising never increases inaccuracy. The counterexamples deserve low a priori credence.

Fallis, Don, and Peter J. Lewis. 2016. “The Brier Rule Is Not a Good Measure of Epistemic Utility (and Other Useful Facts about Epistemic Betterness).” *Australasian Journal of Philosophy* 94 (3): 576–90. doi.org/10.1080/00048402.2015.1123741.

Lewis, Peter J., and Don Fallis. 2021. “Accuracy, Conditionalization, and Probabilism.” *Synthese* 198 (5): 4017–33. doi.org/10.1007/s11229-019-02298-3.