## Simplicity and indifference

According to the Principle of Indifference, alternative propositions that are similar in a certain respect should be given equal prior probability. The tricky part is to explain what should count as similarity here.

Van Fraassen's cube factory nicely illustrates the problem. A factory produces cubes with side lengths between 0 and 2 cm, and consequently with volumes between 0 and 8 cm^3. Given this information, what is the probability that the next cube that will be produced has a side length between 0 and 1 cm? Is it 1/2, because the interval from 0 to 1 is half of the interval from 0 to 2? Or is it 1/8, because a side length of 1 cm means a volume of 1 cm^3, which is 1/8 of the range from 0 to 8?

Some try to dodge the problem by saying that the alternative
propositions that should be given equal probability, by the Principle
of Indifference, must be "equally supported by the evidence". The
problem now re-arises as the question what it takes for alternative
propositions to be equally supported. I think it is best anyway to
express the Principle as a constraint on *prior* probabilities,
which don't factor in any relevant evidence.

Many philosophers seem to have given up on the Principle of Indifference, suggesting that there simply is no such norm on rational credence.

That seems wrong to me. Suppose a murder has been committed, and exactly one of the gardener, the butler, and the cook could have done it. In the absence of further evidence, surely one should give roughly equal probability to the three possibilities. What norm of rationality do you violate if you are confident that the gardener did it, if not the Principle of Indifference?

So we're stuck with the problem of saying when two propositions
should count as "similar" so as to deserve equal prior
probability. Here's an answer that looks promising to me and that I
haven't seen in the literature: propositions should get equal prior
probability if they are *equally simple* and *equally
specific*, in a sense I am going to explain.

This criterion is motivated by a certain approach to the problem of induction.

Suppose we know that there are 1000 ravens in the world. We've seen
100 ravens, all black. Without further relevant information, we should then
be reasonably confident that all the ravens are black. Why so? Why
should the hypothesis that all ravens are black be preferred over the
hypothesis that the first 100 ravens are blacked and all others white?
An attractive answer is that the first hypothesis is
*simpler*.

Let's note one fairly obvious connection between induction and Indifference. Consider the class of all hypotheses about the colour of the 1000 ravens that are compatible with our observation of 100 black ravens. That class is huge. If we gave equal probability to every colour distribution compatible with our data, we would be practically certain that some ravens are non-black. (Even with just two possible colours, white and black, the probability that all ravens are black given our data would be roughly 0.000000–here come 270 zeros–8.)

The lesson is that a simple-minded application of the Principle of Indifference is incompatible with inductive reasoning. If want to spell out a plausible Principle of Indifference, we should make sure it doesn't get in the way of induction.

So perhaps it makes sense to start with models of induction.
Return to the attractive idea that induction is based on a preference
for simpler hypotheses. Roy Solomonoff found a way to render this more
precise. Let's think of scientific hypotheses as algorithms for
producing data. Any algorithm can be specified by a string of binary
symbols fed into a Universal Turing Machine. Define the
*complexity* of a hypothesis as the length of the shortest input
string to a Universal Turing Machine that computes the algorithm. Now
we can understand simplicity as the reciprocal of complexity. That is,
if we want to privilege simpler hypothesis, we can give higher prior
probability to hypotheses with lower complexity. Under some further
modelling assumptions, it turns out that there is only one natural
probability measure that achieves this: Solomonoff's *universal
prior*. (See Rathmanner & Hutter
2011 for more details and pointers to even more details.)

The assumption that simpler theories should get greater probability is meaningless without some criteria for simplicity. Any theory and any data whatsoever can be expressed by a single letter in a suitably cooked-up language. So we either have to make the rational prior language-relative or we have to assume that there is a privileged language in terms of which simplicity is measured. The second option is more daunting, but I think it's clearly the way to go. I don't know how the privileged language should be defined. It is tempting to stipulate that all non-logical terms in the language must express what Lewis calls "perfectly natural" properties and relations, but I'm not sure.

Of course, this is a problem for everyone. If seeing lots of green emeralds makes it reasonable to believe that all emeralds are green and not that they are grue, and if we think that this is not a language-relative fact – what makes it irrational to conclude that all emeralds are grue is not a fact about our language or psychology – then there must be something objective that favours hypotheses expressed with 'green' over hypotheses expressed with 'grue'. There is no special problem here for Solomonoff.

I do have a few other reservations about Solomonoff's
approach. Some of the modelling assumptions used to derive the measure
look problematic to me. I'm also not sure that we should always favour
simpler hypotheses. For example, I think the Copenhagen interpretation
of quantum mechanics deserves practically zero credence, but not
because it is so complicated. (Although perhaps it *does* come
out complicated if we tried to translate the concept of an observation
that figures in the theory into the language of perfectly natural
properties?)

So I'm not convinced that Solomonoff's prior is the uniquely ideal
prior probability. But we don't need to go all the way with
Solomonoff. It seems plausible to me that simpler theories should
*generally* be given greater prior probability, and that this is
what vindicates inductive reasoning, however exactly the idea is
spelled out.

Now notice that if the probability of any hypothesis with complexity k is greater than the probability of any hypothesis with complexity k+1, and smaller than the probability of any hypothesis with complexity k-1, then there can't be much variability in the probability of hypotheses with complexity k. Solomonoff's universal prior, for example, assigns exactly the same probability to hypotheses with equal complexity.

That may seem odd. Intuitively, the hypothesis that *it is windy
and sunny* is equally simple as the hypothesis that *it is windy
or sunny*, but surely the two should not be given equal prior
probability.

In response, recall that the relevant "hypotheses" in Solomonoff's account are algorithms that generate present and future data. The idea is that the entire world can be modelled as a big stream of data. Only recursive streams deserve positive probability, and their probability is supposed to be determined by the length of the shortest computer program that produces the stream. (There is a way to allow for stochastic programs, as Hutter discusses, but let's ignore that.)

So Solomonoff's account entails a Principle of Indifference for
*maximally specific* hypotheses – hypotheses that determine
a unique stream of data. The Principle says that maximally specific
hypotheses that are equally simple should get equal prior
probability. Let's call this *Solomonoff's Principle of
Indifference*.

Can we generalise the Principle to less specific hypotheses? Yes,
but the generalisation isn't obvious. The problem is that some
disjunctions of maximally specific hypotheses are equivalent to
simpler sentences in the privileged language, while others are
not. But if the disjunctions have the same number of disjuncts, they
will will have equal (universal prior) probability. So it looks like
we'll need an unusual measure of simplicity or complexity on which the
complexity of an unspecific proposition is defined as the sum of the
complexity of the maximally specific hypotheses that entail the
proposition. If we then stipulate that two propositions are *equally
specific* iff they are entailed by the same number of maximally
specific hypotheses with positive probability, it follows from
Solomonoff's account that equally specific propositions that are
equally simple should have equal prior probability.

Again, we don't need to go all the way with Solomonoff. The general
point is that a promising approach to induction implies *something
like* Solomonoff's Principle of Indifference.

What does that Principle say about the murder example? It's hard to prove, but it seems plausible that the gardener, butler, and cook hypotheses are equally specific and equally simple in the relevant sense. So they should get equal prior probability.

What about the cube factory? No surprises: it's completely unclear what Solomonoff's Principle will say here. It might be instructive to work through the details to see if, for example, the answer depends on the choice of the "privileged" language.

Anyway, here's the upshot: if Solomonoff's approach to induction is on the right track, then anyone who isn't a radical subjectivist about induction should endorse a Principle of Indifference.