## Decreasing accuracy through learning

Last week I gave a talk in which I claimed (as an aside) that if you update your credences by conditionalising on a true proposition then your credences never become more inaccurate. That seemed obviously true to me. Today I tried to quickly prove it. I couldn't. Instead I found that the claim is false, at least on popular measures of accuracy.

The problem is that conditionalising on a true proposition typically increases the probability of true propositions as well as false propositions. If we measure the inaccuracy of a credence function by adding up an inaccuracy score for each proposition, the net effect is sensitive to how exactly that score is computed.

Here is a toy example, adapted from (Fallis and Lewis 2016), where this point has first been made, as far as I can tell.

Suppose there are only three worlds, w1, w2, and w3, with credence 0.1, 0.4, and 0.5, respectively. w1 is the actual world. Suppose we measure the inaccuracy of this credence function with respect to any proposition A by |Cr(A)-w1(A)|2, so that the total inaccuracy of the credence function is ∑A |Cr(A)-w1(A)|2. (Here w1(A) is the truth-value of A at w1.) As you can check, the inaccuracy of the credence function is then 2.44.

Now conditionalise the credence function on { w1, w2 }, so that the new credence function assigns 0.2 to w1 and 0.8 to w2. Note that { w1 } is true but { w2 } is false, and the credence in the false proposition increased a lot from 0.4 to 0.8. If you add up the inaccuracy scores for the 6 non-trivial propositions, you now get 2.56. Learning a true proposition has increased the inaccuracy of the credence function from 2.44 to 2.56.

There are other ways of measuring inaccuracy. For example, we could use the absolute distance |Cr(A)-w1(A)| instead of the squared distance |Cr(A)-w1(A)|2. I think this would get around the problem. (It certainly does in the example.) More simply, we could measure the inaccuracy of your credence function in terms of the credence you assign to the actual world: the lower that credence, the higher the inaccuracy. Then it's trivial that conditionalising on a true proposition never increases inaccuracy.

However, as Lewis and Fallis (2021) point out (with respect to the second alternative measure, but the point also holds for the first), these measures can't be used to justify probabilism. The absolute measure isn't "proper". And any measure that only looks at individual worlds can't tell apart a probabilistic credence function from a non-probabilistic function that assigns the same values to all worlds. Friends of accuracy-based epistemology therefore won't like the alternative measures. It looks like they have to accept the strange conclusion that learning true propositions sometimes (in fact, often) decreases the accuracy of one's belief state.

There might be a way around this conclusion. In the example, we assumed that w2 has greater prior probability than w1. This is not a coincidence. If the worlds that are compatible with the evidence have equal prior probability then I think conditionalising never increases inaccuracy, under some plausible assumptions about the inaccuracy measure. (Which assumptions? I don't know.) If that is right, we could avoid the problem by stipulating that rational priors should be uniform.

Lewis and Fallis mention this response in (Lewis and Fallis 2021) (without proving that it would actually work), but reply that even if you start out with a uniform prior you could end up with an intermediate credence function that is skewed towards non-actual possibilities, from where the problem can again arise.

But if you only ever change your beliefs by conditionalisation then the worlds compatible with your total history of evidence will always have equal probability. Your intermediate credence function can't be skewed towards non-actual possibilities.

Lewis and Fallis, in effect, intuit that conditionalisation should decrease inaccuracy relative to any coarse-graining of the agent's probability space. Their "Elimination" requirement says that if { …Xi… } is any partition of propositions and we compute credal inaccuracy by summing only over the propositions in this partition (or in the algebra generated by the partition) then conditionalising on the negation of one member of the partition should decrease inaccuracy. I don't find this requirement especially appealing. Anyway, I'm interested in the effect of conditionalisation on the accuracy of the agent's entire credence function, where no propositions are ignored.

So I think the "uniform prior" response would work. The problem is that rational priors should not be uniform – on any sensible way of parameterizing logical space. Uniform priors are the high road to skepticism. A rational credence function should favour worlds where random samples are representative, where experiences as of a red cube correlate with the presence of a red cube, where best explanations tend to be true, and so on. An agent whose priors aren't biased in these ways will not be able to learn from experience in the way rational agents can.

So the problem remains. I'm somewhat inclined to agree with Lewis and Fallis that this reveals a flaw with the popular inaccuracy measures, and therefore with popular accuracy-based arguments. From a veritist perspective, conditionalising on a true proposition surely makes a credence function better. A measure of accuracy on which the credence function has become less accurate therefore doesn't capture the veritist sense of epistemic betterness.

One more thought on this.

If I'm right about the shape of rational priors, and if an agent only ever changes their mind by conditionalisation, then conditionalising only decreases accuracy (relative to the popular measures) if the actual world is a skeptical scenario. In the example, the actual world w1 has lower prior probability than w2. If the agent only ever changed their mind by conditionalisation then w1 has lower ultimate prior probability. And I claim that w1 should have lower ultimate prior probability than some other world only w1 is a skeptical scenario. It needn't be a radical skeptical scenario, but it should have more of a skeptical flavour than w2.

So even if we hold on to classical measures of accuracy, we can at least say that if the world is as we should mostly think it is, then conditionalising never increases inaccuracy. The counterexamples deserve low a priori credence.

Fallis, Don, and Peter J. Lewis. 2016. “The Brier Rule Is Not a Good Measure of Epistemic Utility (and Other Useful Facts about Epistemic Betterness).” Australasian Journal of Philosophy 94 (3): 576–90. doi.org/10.1080/00048402.2015.1123741.
Lewis, Peter J., and Don Fallis. 2021. “Accuracy, Conditionalization, and Probabilism.” Synthese 198 (5): 4017–33. doi.org/10.1007/s11229-019-02298-3.

# on 30 November 2021, 16:24

"From a veritist perspective, conditionalising on a true proposition surely makes a credence function better."

I don't think the intuition that learning a true proposition improves one's epistemic state survives once we think about (a) misleading evidence and (b) the fact that some propositions are epistemically much more important than others, independently of any formal framework for measuring epistemic utilities.

Let's say that a billion people have received a medication for a dangerous disease over the last year and another billion have received a placebo. I randomly choose a sample of a hundred thousand from each group. Let E be the proposition that in my random sample of the medicated, each person died within a week of reception, and in my random sample of the placeboed, no person died within a week of reception.

Suppose that in fact the drug is safe and highly beneficial, but nonetheless E is true. (Back of envelope calculation says that in any given week, given a billion people, about two hundred thousand will die. So it is nomically possible that the first random sample will consist of only those who die a week after receiving the drug, no matter how safe the drug, and it is nomically possible that the second random sample won't contain anyone who died within a week of the placebo.)

After updating on the truth E, I will rationally believe that the drug is extremely deadly. Result: I am worse off epistemically, because getting right whether the drug is safe is more important than getting right the particular facts reported in E.

The obvious thing about this case is that it is astronomically unlikely that E would be true. The *expected* epistemic value of learning about the death numbers in the drug and placebo samples is positive, and that's what a proper scoring rule yields. But on rare occasions things go badly.

Of course, in my example above, the implicit scoring rule doesn't have uniform weights across propositions like in your examples. But scoring rules with uniform weights seem really unrealistic. In scientific cases, I take it that normally, getting right the particular data gathered from an experiment has much lower epistemic value than getting right the theories that the data is supposed to bear on. That's why years later the details of the data are largely forgotten but the theories live on. And sometimes they live on on false pretences, because the data was misleading. (And sometimes the data was false, of course.)

# on 30 November 2021, 16:28

I should amend my case to suppose that in my placebo sampling, we observe something reasonably close to the expected death rate in the population. For if I observed no deaths in my placebo group, the rational thing to believe would probably be that there was something badly wrong with my ability to observe death. :-)

# on 30 November 2021, 20:16

Thanks. I think I'm warming up to the idea that conditionalisation can reduce actual accuracy, at least under unfavourable conditions as the ones you describe. I'm not sure if scientfic theories should be given greater epistemic weight than arbitrary contingent truths. That does sound plausible, but maybe it can be explained by the fact that the theories imply lots of other contingent truths. (Toy example: if we know that there are 100 ravens, then 'all ravens are black' counts more than 'raven 17 is black' because it is equivalent to the conjunction of 100 atomic statements about the ravens.) Of course it's not at all obvious how to make this idea precise.

# on 01 December 2021, 16:38

It would be nice to have an accuracy measure s such that:
1. s(P_A)(w) >= s(P)(w) at all worlds when A is true at w and P is consistent
2. E_P s(P) < P(A) E_{P_A} s(P_A) + P(A^c) E_{P_{A^c}} s(P_{A^c}) whenever 0<P(A)<1,
where s(P) is a function from worlds to [-infty,M] for some finite M, P_A is P conditioned on A, and E_P is expectation with respect to P.

Is there such a measure?

Strict propriety entails (2). One might guess that (2) is equivalent to strict propriety, but in fact (2) doesn't even entail propriety. [Let s0(P) be 1 if P is zero on some non-empty set and 0 otherwise. Then (2) holds with non-strict inequality for s0. Let s(P) = s0(P) + a Brier(P) for some small positive a. Then (2) holds with strict inequality for Brier. Hence (2) holds with strict inequality for s. But clearly s isn't proper, at least if a is small. For let P be uniform. Then E_P s0(P) will be zero, but if Q is zero on some non-empty set, E_P s0(Q) will be one, and for small a we will have E_P s(P) < E_P s(Q), contrary to propriety.]

One might throw additivity into the mix, but I am sceptical of additivity.

# on 01 December 2021, 20:23

It was dumb of me to ask if there is such a measure, since 1 implies 2.

# on 02 December 2021, 20:44

I *think* the following is true. Consider any scoring rule which can be written in the form: score(p,w) = sum_A s(p(A),w(A)), where the sum is taken over all events A, and where s is a fixed strictly proper scoring rule for single event forecasts. Then there are priors p, an event E and a world w where E happens at w and accuracy decreases at w upon conditioning on E.