Simplicity and indifference

According to the Principle of Indifference, alternative propositions that are similar in a certain respect should be given equal prior probability. The tricky part is to explain what should count as similarity here.

Van Fraassen's cube factory nicely illustrates the problem. A factory produces cubes with side lengths between 0 and 2 cm, and consequently with volumes between 0 and 8 cm^3. Given this information, what is the probability that the next cube that will be produced has a side length between 0 and 1 cm? Is it 1/2, because the interval from 0 to 1 is half of the interval from 0 to 2? Or is it 1/8, because a side length of 1 cm means a volume of 1 cm^3, which is 1/8 of the range from 0 to 8?

Some try to dodge the problem by saying that the alternative propositions that should be given equal probability, by the Principle of Indifference, must be "equally supported by the evidence". The problem now re-arises as the question what it takes for alternative propositions to be equally supported. I think it is best anyway to express the Principle as a constraint on prior probabilities, which don't factor in any relevant evidence.

Many philosophers seem to have given up on the Principle of Indifference, suggesting that there simply is no such norm on rational credence.

That seems wrong to me. Suppose a murder has been committed, and exactly one of the gardener, the butler, and the cook could have done it. In the absence of further evidence, surely one should give roughly equal probability to the three possibilities. What norm of rationality do you violate if you are confident that the gardener did it, if not the Principle of Indifference?

So we're stuck with the problem of saying when two propositions should count as "similar" so as to deserve equal prior probability. Here's an answer that looks promising to me and that I haven't seen in the literature: propositions should get equal prior probability if they are equally simple and equally specific, in a sense I am going to explain.

This criterion is motivated by a certain approach to the problem of induction.

Suppose we know that there are 1000 ravens in the world. We've seen 100 ravens, all black. Without further relevant information, we should then be reasonably confident that all the ravens are black. Why so? Why should the hypothesis that all ravens are black be preferred over the hypothesis that the first 100 ravens are blacked and all others white? An attractive answer is that the first hypothesis is simpler.

Let's note one fairly obvious connection between induction and Indifference. Consider the class of all hypotheses about the colour of the 1000 ravens that are compatible with our observation of 100 black ravens. That class is huge. If we gave equal probability to every colour distribution compatible with our data, we would be practically certain that some ravens are non-black. (Even with just two possible colours, white and black, the probability that all ravens are black given our data would be roughly 0.000000–here come 270 zeros–8.)

The lesson is that a simple-minded application of the Principle of Indifference is incompatible with inductive reasoning. If want to spell out a plausible Principle of Indifference, we should make sure it doesn't get in the way of induction.

So perhaps it makes sense to start with models of induction. Return to the attractive idea that induction is based on a preference for simpler hypotheses. Roy Solomonoff found a way to render this more precise. Let's think of scientific hypotheses as algorithms for producing data. Any algorithm can be specified by a string of binary symbols fed into a Universal Turing Machine. Define the complexity of a hypothesis as the length of the shortest input string to a Universal Turing Machine that computes the algorithm. Now we can understand simplicity as the reciprocal of complexity. That is, if we want to privilege simpler hypothesis, we can give higher prior probability to hypotheses with lower complexity. Under some further modelling assumptions, it turns out that there is only one natural probability measure that achieves this: Solomonoff's universal prior. (See Rathmanner & Hutter 2011 for more details and pointers to even more details.)

The assumption that simpler theories should get greater probability is meaningless without some criteria for simplicity. Any theory and any data whatsoever can be expressed by a single letter in a suitably cooked-up language. So we either have to make the rational prior language-relative or we have to assume that there is a privileged language in terms of which simplicity is measured. The second option is more daunting, but I think it's clearly the way to go. I don't know how the privileged language should be defined. It is tempting to stipulate that all non-logical terms in the language must express what Lewis calls "perfectly natural" properties and relations, but I'm not sure.

Of course, this is a problem for everyone. If seeing lots of green emeralds makes it reasonable to believe that all emeralds are green and not that they are grue, and if we think that this is not a language-relative fact – what makes it irrational to conclude that all emeralds are grue is not a fact about our language or psychology – then there must be something objective that favours hypotheses expressed with 'green' over hypotheses expressed with 'grue'. There is no special problem here for Solomonoff.

I do have a few other reservations about Solomonoff's approach. Some of the modelling assumptions used to derive the measure look problematic to me. I'm also not sure that we should always favour simpler hypotheses. For example, I think the Copenhagen interpretation of quantum mechanics deserves practically zero credence, but not because it is so complicated. (Although perhaps it does come out complicated if we tried to translate the concept of an observation that figures in the theory into the language of perfectly natural properties?)

So I'm not convinced that Solomonoff's prior is the uniquely ideal prior probability. But we don't need to go all the way with Solomonoff. It seems plausible to me that simpler theories should generally be given greater prior probability, and that this is what vindicates inductive reasoning, however exactly the idea is spelled out.

Now notice that if the probability of any hypothesis with complexity k is greater than the probability of any hypothesis with complexity k+1, and smaller than the probability of any hypothesis with complexity k-1, then there can't be much variability in the probability of hypotheses with complexity k. Solomonoff's universal prior, for example, assigns exactly the same probability to hypotheses with equal complexity.

That may seem odd. Intuitively, the hypothesis that it is windy and sunny is equally simple as the hypothesis that it is windy or sunny, but surely the two should not be given equal prior probability.

In response, recall that the relevant "hypotheses" in Solomonoff's account are algorithms that generate present and future data. The idea is that the entire world can be modelled as a big stream of data. Only recursive streams deserve positive probability, and their probability is supposed to be determined by the length of the shortest computer program that produces the stream. (There is a way to allow for stochastic programs, as Hutter discusses, but let's ignore that.)

So Solomonoff's account entails a Principle of Indifference for maximally specific hypotheses – hypotheses that determine a unique stream of data. The Principle says that maximally specific hypotheses that are equally simple should get equal prior probability. Let's call this Solomonoff's Principle of Indifference.

Can we generalise the Principle to less specific hypotheses? Yes, but the generalisation isn't obvious. The problem is that some disjunctions of maximally specific hypotheses are equivalent to simpler sentences in the privileged language, while others are not. But if the disjunctions have the same number of disjuncts, they will will have equal (universal prior) probability. So it looks like we'll need an unusual measure of simplicity or complexity on which the complexity of an unspecific proposition is defined as the sum of the complexity of the maximally specific hypotheses that entail the proposition. If we then stipulate that two propositions are equally specific iff they are entailed by the same number of maximally specific hypotheses with positive probability, it follows from Solomonoff's account that equally specific propositions that are equally simple should have equal prior probability.

Again, we don't need to go all the way with Solomonoff. The general point is that a promising approach to induction implies something like Solomonoff's Principle of Indifference.

What does that Principle say about the murder example? It's hard to prove, but it seems plausible that the gardener, butler, and cook hypotheses are equally specific and equally simple in the relevant sense. So they should get equal prior probability.

What about the cube factory? No surprises: it's completely unclear what Solomonoff's Principle will say here. It might be instructive to work through the details to see if, for example, the answer depends on the choice of the "privileged" language.

Anyway, here's the upshot: if Solomonoff's approach to induction is on the right track, then anyone who isn't a radical subjectivist about induction should endorse a Principle of Indifference.


No comments yet.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.