The subjective Bayesian answer to the problem of induction

Some people – important people, like Richard Jeffrey or Brian Skyrms – seem to believe that Laplace and de Finetti have solved the problem of induction, assuming nothing more than probabilism. I don't think that's true.

I'll try to explain what the alleged solution is, and why I'm not convinced. I'll pick Skyrms as my adversary, mainly because I've just read Skyrms and Diaconis's Ten Great Ideas about Chance, in which Skyrms presents the alleged solution in a somewhat accessible form.

In the end, I suspect our disagreement boils down to a disagreement over what "the problem of induction" is. So let's not define the problem, but simply start with the usual toy example.

You have seen 100 ravens, all of which have been black. Normally, you should be confident that the next raven you'll see will also be black. The problem is to explain why. Why, for example, would it be wrong to believe that the next raven will be white?

I said "normally", because the matter really depends on what other information you have. If, for example, you knew in advance that you were going to see 100 black ravens followed by one white raven, then you shouldn't be confident that the next raven will be black. Any simple and general rule that allows extrapolating past regularities into the future is subject to this kind of counterexample.

The counterexample also indicates why subjective Bayesianism – roughly, probabilism + conditionalisation – can't solve the problem of induction. Suppose you start out with a high prior credence in the counter-inductive hypothesis that you will see 100 black ravens followed by a white raven, without any positive reason to think that this is true. You don't violate the norms of probabilism or conditionalisation. After seeing 100 black ravens, you will be confident that the next raven is white.

Indeed, suppose you start out giving equal credence to all possible ways of distributing colours (black and white, say) over the first 101 ravens you may encounter. Unlike the counter-inductive prior we've just considered, this uniform prior doesn't seem obviously crazy. But if you now observe 100 black ravens (and you conditionalise on the observations), you'll give equal credence to the 101st raven being black and it being white.

In order to learn by induction, your priors must be biased towards some sequences of Black and White, and against others. The all-Black sequence must have greater prior probability than the 100-Black-then-1-White sequence. This is evidently not a consequence of the probability caculus.

Sure, says Skyrms. But look what Laplace figured out.

Suppose you regard the colour of each raven as a matter of chance. You think that each raven has a certain chance of being black and of being white. The chance might be 1, or 0, or something in between. Laplace showed that if you start with a uniform prior over the possible chance values (0, 1, anything in between), and you obey the Principal Principle, then after seeing 100 black ravens, your credence in the next raven being black will be around 0.99.

We can generalise Laplace's observation. You don't need to start with a uniform prior. If you think that raven colour is a matter of chance, and you start with any regular prior over that chance, then you will be confident that observing more and more ravens will make your credences converge to the true chance. For example, you will be confident that if the true chance of Black is 0.9, then after enough raven observations you will become confident that the chance of Black is near 0.9, whereupon you'll expect the next raven to be black with credence around 0.9.

That's neat. But why assume that you regard the colour of ravens as independent chance events? That's not a consequence of the probability calculus. Even if you do regard raven colour as a matter of chance, how do you know that earlier raven observations don't provide "inadmissible evidence" for the relevant application of the Principal Principle? Indeed, why can we assume the Principal Principle to begin with, if we're restricting ourselves to the confines of subjective Bayesianism?

Enter de Finetti.

Consider again your prior credence over possible sequences of raven colours (Black, Black, White, Black, etc.). Your credences are exchangeable if they are sensitive only to the number of Black and White occurrences within a given sequence, and insensitive to the order of these occurrences. De Finetti showed that if your credences are exchangeable then you are reasoning as if you regard each position in the sequence as the outcome of an independent chance event. You will also reason as if you align your beliefs to these chances, as demanded by the Principal Principle. Exchangeability is all we need to get the Laplacean reasoning going.

For example, if your credences are exchangeable, then you must be confident that after sufficiently many raven observations, your credence in the next raven being black will be close to the actual relative frequency of black ravens in the sequence.

(De Finetti assumes the sequence is infinite, but Skyrms and Diaconis show that this isn't essential.)

Again, that's a neat result. It tells us a little about what "inductive" and "counter-inductive" priors look like, at least for our toy example.

Note that if you start out confident that you're going to see 100 black ravens followed by 1 white raven, then your credences aren't exchangeable.

Similarly if you start out with exchangeable credences over sequences of grue-like "colours". Let 'blight' mean 'is one of the first 100 ravens and black or a later raven and white'. Let 'whack' mean 'is one of the first 100 ravens and white or a later raven and black'. If you start out with exchangeable credences over Blight/Whack sequences, your credences over Black/White sequences are not exchangeable.

Are all counter-inductive priors non-exchangeable? Are all non-exchangeable priors counter-inductive? Interesting questions. (I'm sure someone has looked into them.)

At any rate, the problem with all this is obvious. Why should we assume that your priors are exchangeable over the Black/White sequences? For all the probability calculus is concerned, you might just as well start with one of the counter-inductive priors, in which case seeing 100 black ravens will make you confident that the next raven is white.

At this stage, Skyrms makes two points.

First, he points out that de Finetti's theorem can be generalised. Your credences don't have to be exchangeable. It's enough that they are "partially exchangeable". We don't need to get into what exactly that means. Roughly speaking, your credences are partially exchangeable if they are invariant under certain transformations.

Second, says Skyrms, as long as your credences are exchangeable over some sequences – be it the Black/White sequences or the Blight/Whack sequences – probabilism ensures that you will reason inductively, in the sense that you will expect past frequencies to continue into the future.

We can put these two points together. As long as your credences display certain symmetries, as long as they are invariant under certain transformations, probabilism ensures that you will take the past to be a predictor of the future.

Here one might add another point that Skyrms doesn't mention: if you are a cognitively limited creature, your credences had better display some symmetries. Suppose you think you're going to observe 1000 ravens, each of which will be either black or white. There are \( 2^{1000} \) relevant sequences of Black and White. That's roughly 1 nonillion. You don't want to store and update a separate probability for each of these sequences. If your credences are exchangeable, you only need to store and update a probability distribution over the "chance" of Black and White, which, if the distribution is chosen sensibly, requires maybe three or four memory registers, and allows for efficient updating algorithms.

From this perspective, Laplace and de Finetti have found an explanation for the human tendency to believe that the future will resemble the past.

Which is neat. As I said. But it doesn't solve the problem of induction.

The problem of induction is to explain why, under normal conditions, observing 100 black ravens (and no non-black raven) should make you confident that the next raven is black. Probabilism, or probabilism + conditionalisation, cannot provide an answer, because it can't rule out counter-inductive priors.

Here, I think, Skyrms and I simply disagree over what needs to be explained.

I believe that there are substantive norms constraining how confident one may be in a hypothesis in light of some evidence, even if the evidence entails neither the hypothesis nor its negation.

Our evidence does not conclusively settle whether greenhouse gas emissions contribute to climate change, whether Trump lost the 2020 election, whether spacetime is curved by massive bodies, or whether the sun will rise tomorrow. Still, it strongly supports these hypotheses. It would be irrational to be confident in their negation, given our evidence.

But why would it be irrational? What if someone disagrees? What if someone looks at all the evidence and becomes confident, on this basis, that greenhouse gas emissions don't contribute to climate change, or that the sun will explode before tomorrow? They need not be probabilistically incoherent. In its most general form, the problem of induction is to explain why they are irrational.

I don't know what Skyrms thinks of this problem. I suspect he simply disagrees that the counter-inductive reasoner is irrational. I suspect he would say that the counter-inductive reasoner is only irrational from our perspective: given the symmetries in our credences, we expect the counter-inductive reasoner to form wildly inaccurate beliefs. The counter-inductive reasoner might think the same about us. There's no objective standard for who is right.

To me, this "solution" to the problem of induction amounts to capitulation. It is inductive skepticism.


# on 29 September 2022, 05:05

The solution to this given by Solomonoff induction, if I understand things correctly, is that if your prior is suitably Universal, then this will guarantee that the Bayesian approach will converge to the correct answer given sufficient data (and a classically computable world etc etc). It is irrational to choose priors such that your beliefs will not move to the true state of nature with enough updates (and where your priors are not invariant to reparameterization and regrouping).

# on 29 September 2022, 05:37

@David Duffy: Yes, Solomonoff induction might be a candidate for what I would consider a solution. I'm not fully convinced, but this might just be because I've never seen a careful presentation of the proposal (that, for example, doesn't take for granted that the world can be modeled as a string of projectible properties).

In the blog post, I only wanted to comment on an alternative proposal.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.