Posts on: Chance
Bruno de Finetti (de Finetti (1970)) suggested that chance is objectified credence. The suggestion is explained and defended in Jeffrey (1983, ch.12), Skyrms (1980 ch.I), Skyrms (1984, ch.3), and Diaconis and Skyrms (2017, ch.7), but I still find it hard to understand. It seems to assume that rational credence functions are symmetrical in a way in which I think they shouldn't be.
The Best-Systems Account of chance promises to explain why beliefs about chance should affect our beliefs about ordinary events, as formalized by the Principal Principle. In this post, I want to discuss a challenge to any such explanation.
First, some background.
For any candidate chance function f, let [f] be the set of worlds of which f is (part of) the best system. According to the Best-Systems Account (BSA), the hypothesis "Ch=f" that f is the true chance function expresses the proposition [f]. In what follows, I'll assume that a world is simply a history of "outcomes", and that the candidate systems can be compressed into a single (possibly parameterized) chance function.
Wilhelm (2021) and Lando (2022) argue that the Sleeping Beauty problem reveals a flaw in standard accounts of credence and chance. The alleged flaw is that these accounts can't explain how attitudes towards centred propositions are constrained by information about chance.
I assume you remember the Sleeping Beauty problem. (If not, look it up: it's fun.) Wilhelm makes the following assumptions about Beauty's beliefs on Monday morning.
First, Beauty can't be sure that it is Monday:
A lot of rather technical papers on conditionals have come out in recent years. Let's have a look at one of them: Kocurek (2022).
The paper investigates Al Hajek's argument (e.g. in Hájek (2021)) that "chance undermines would". It begins with a neat observation.
In this
2018 paper, J. Dmitri Gallow shows that it is difficult to combine
multiple deference principles. The argument is a little complicated,
but the basic idea is surprisingly simple.
Suppose A and B are two weather forecasters. Let r be the
proposition that it will rain tomorrow, let A=x be the proposition
that A assigns probability x to r; similarly for B=x. Here are two
deference principles you might like to follow:
You observe a process that generates two kinds of outcomes, 'heads'
and 'tails'. The outcomes appear in seemingly random order, with
roughly the same amount of heads as tails. These observations support
a probabilistic model of the process, according to which the
probability of heads and of tails on each trial is 1/2, independently
of the other outcomes.
How observations about frequencies confirm or disconfirm
probabilistic models is well understood in Bayesian epistemology. The
central assumption that does most of the work is the Principal
Principle, which states that if a model assigns (objective)
probability x to some outcomes, then conditional on the model, the
outcomes have (subjective) probability x. It follows that models that
assign higher probability to the observed outcomes receive a greater
boost of subjective probability than models that assign lower
probability to the outcomes.
The following principles have something in common.
Conditional Coordination Principle.
A rational person's credence in a conditional A->B should equal the
ratio of her credence in the corresponding propositions B and A&B;
that is, Cr(A->B) = Cr(B/A) = Cr(B)/Cr(A&B).
Normative Coordination Principle.
On the supposition that A is what should be done, a rational agent
should be motivated to do A; that is, very roughly, Des(A/Ought(A))
> 0.5.
Probability Coordination Principle.
On the supposition that the chance of A is x, a rational agent
should assign credence x to A; that is, roughly, Cr(A/Ch(A)=x) = x.
Nomic Coordination Principle.
On the supposition that it is a law of nature that A, a rational agent
should assign credence 1 to A; that is, Cr(A/L(A)) = 1.
All these principles claim that an agent's attitudes towards a certain
kind of proposition rationally constrain their attitudes towards other
propositions.
Let's look at the third type of case in which credences can come apart from known chances. Consider the following variation of the Sleeping Beauty problem (a.k.a. "The Absentminded
Driver"):
Before Sleeping Beauty awakens on Monday, a coin is
tossed. If the coin lands tails, Beauty's memories of Monday will be
erased the following night, and the coin will be tossed again on
Tuesday. If the Monday toss lands heads, no memory erasure or further
tosses take place. Beauty is aware of all these facts.
When Beauty awakens on Monday morning and learns that today's toss
has landed tails (alternatively: that the Monday toss has landed
tails), how should that affect her credence in the hypothesis that the
coin is fair?
Next, undermining. Suppose we are testing a model H according to
which the probability that a certain type of coin toss results in
heads is 1/2. On some accounts of physical probability, including
frequency accounts and "best system" accounts, the truth of H is
incompatible with the hypothesis that all tosses of the relevant type
in fact result in heads. So we get a counterexample to simple
formulations of the Principal Principle: on the assumption that H is
true, we know that the outcomes can't be all-heads, even though H
assigns positive probability to all-heads. In such a case, we say that
all-heads is undermining for H.
Suppose we are testing statistical models of some physical process
-- a certain type of coin toss, say. One of the models in question
holds that the probability of heads on each toss is 1/2; another holds
that the probability is 1/4. We set up a long run of trials and
observe about 50 percent heads. One would hope that this confirms the
model according to which the probability of heads is 1/2 over the
alternative.
(Subjective) Bayesian confirmation theory says that some evidence E
supports some hypothesis H for some agent to the extent that the
agent's rational credence C in the hypothesis is increased by the
evidence, so that C(H/E) > C(H). We can now verify that observation of
500 heads strongly confirms that the coin is fair, as follows.
Decision theoretic representation theorems show that one can read
off an agent's probability and utility functions from their
preferences, provided the latter satisfy certain minimal rationality
constraints. More substantive rationality constraints should therefore
translate into further constraints on preference. What do these
constraints look like?
Here are a few steps towards an answer for one particular
constraint: a simple form of the Principal Principle. The Principle
states that if cr is a rational credence function and ch=p is the
hypothesis that p is the chance function, then for any E in the domain
of p,
If you spin a wheel of fortune, the outcome -- red or black -- depends
on the speed with which you spin. As you increase the speed,
the outcome quickly cycles through the two possibilities red and
black. As a consequence, any reasonably smooth probability distribution
(or frequency distribution) over initial speed determines an
approximately equal probability (frequency) for red and black. Here is
an example of such a distribution, taken from Strevens.

I've been asked to review Michael Strevens's new book,
Tychomancy. This motivated me to have another look at his
earlier book Bigger than Chaos.
The aim of Bigger than Chaos is to explain how apparently
chaotic interactions in highly complex systems often give rise to
simple large-scale regularities, such as the laws of thermodynamics,
the stability of predator/prey population levels, or the economic
cycle. The basic explanatory strategy, which Strevens calls enion
probability analysis (EPA), consists in aggregating the
probabilistic dynamics for the individual components of a complex
system into a probabilistic dynamics for macro-level features of the
system.
In her 2012 paper "Subjunctive
Credences and Semantic Humility" (2012), Sarah Moss presents an
interesting case due to John Hawthorne.
Suppose that it is unlikely that you perform a certain physical
movement M tomorrow, though in the unlikely event that you
contract a rare disease D, the chance of your performing M is
high. Suppose also that the combination of contracting D and
performing M causes death. Then many judge that the objective
chance of 'if you were to perform M tomorrow, you would die' is low,
but the conditional objective chance of this subjunctive given that
you perform M is high.
The intuitive judgments Moss reports are
Here is a coin. What would have happened if I had just tossed it?
It might have landed heads, and it might have landed tails. If the
coin is biased towards tails, it is more likely that it would have
landed heads. If it's a fair coin, both outcomes are equally
likely. That is, they are equally likely on the supposition that
the coin had been tossed. Let's write this as P(Heads // Toss) =
1/2, where the double slash indicates that the supposition in question
is "subjunctive" rather than "indicative".
Two initially plausible claims:
- Sometimes, a possible chance function conditionalized on a proposition A yields another possible chance function.
- Any rational prior credence function Cr conditional on the hypothesis Ch=f
that f is the (actual, present) chance function should coincide with
f; i.e., Cr(A / Ch=f) = f(A) for all A (provided that Cr(Ch=f)>0).
Claim 1 is a supported by the popular idea that chances evolve by
conditionalizing on history, so that the chance at time t2 equals the
chance at t1 conditional on the history of events between t1 and
t2. Claim 2 is a weak form of the Principal Principle and often taken
to be a defining feature of chance.
I'll begin with a strange consequence of the best system
account. Imagine that the basic laws of quantum physics are
stochastic: for each state of the universe, the laws assign
probabilities to possible future states. What do these probability
statements mean?
The best system account identifies chance with the probability
function that figures in whatever fundamental physical theory best
combines the virtues of simplicity, strength and fit, where fit is a
matter of assigning high probability to actual events. So when we say
that the chance of some radium atom decaying within the next 1600
years is 1/2, what we claim is true iff whatever fundamental theory
best combines the virtues of simplicity, strength and fit assigns
probability 1/2 to the mentioned outcome. As a piece of ordinary
language philosophy, this is not very plausible. For one thing, people
speak of chances even when it is assumed that the fundamental dynamics
is deterministic. Moreover, by ordinary usage, chances are logically
independent of actual frequencies, which is incompatible with the best
system account. Nevertheless, the account may be plausible as a
somewhat revisionary explication of one strand in the mess that is our
ordinary conception of chance.
Many of our best scientific theories make only probabilistic
predications. How can such theories be confirmed or disconfirmed by
empirical tests?
The answer depends on how we interpret the
probabilistic predictions. If a theory T says 'P(A)=x', and we
interpret this as meaning that Heidi Klum is disposed to bet on A at
odds x : 1-x, then the best way to test T is by offering bets to Heidi
Klum.
Nobody thinks this is the right interpretation of probabilistic
statements in physical theories. Some hold that these statements are
rather statements about a fundamental physical quantity called
chance. Unlike other quantities such as volume, mass or charge,
chance pertains not to physical systems, but to pairs of a time and a
proposition (or perhaps to pairs of two propositions, or to triples of
a physical system and two propositions). The chance quantity is
independent of other quantities. So if T says that in a certain type
of experiment there's a 90 percent probability of finding a particle
in such-and-such region, then T entails nothing at all about particle
positions. Instead it says that whenever the experiment is carried
out, then some entirely different quantity has value 0.9 for a certain
proposition. In general, on this interpretation our best theories say
nothing about the dynamics of physical systems. They only make
speculative claims about a hidden magnitude independent of the
observable physical world.
Expressions like 'P(A/B)', or 'the probability of A given B', seem
to be used in various different ways. On one usage, P(A/B) equals
P(AB)/P(B), at least if P(B) > 0. Call this the ratio
usage. Simple versions of the ratio usage define P(A/B) as
P(AB)/P(B), and so entail that P(A/B) is undefined whenever
P(B)=0. But I would like to admit views into the family on which
P(A/B) is taken as a primitive binary probability, governed by
something like the Popper-Renyi conditions.
This paper (recently
featured on the
physics arXiv blog) argues that if the universe never comes to an
end, then the universe will probably come to an end within the next 5 billion
years. The reasoning, as far as I can tell, goes roughly like
this.
First, define the probability of an event of type A given an event
of type B as the total number of A events over the number of B
events. If the universe is infinite, then the total number of A events
and B events will often be infinite. But infinity over infinity isn't
well-defined. So to have well-defined probabilities, the relevant
counts of A and B events must be restricted, e.g. to a finite initial
segment of the universe.
Rational credence should match the expectation of objective
chance. Here I will have a brief look at what happens
to this connection between credence and chance on the assumption that
credence is centered and chance is not.
1. Fixing the time. Both credences and chances evolve over time. When a
coin is tossed twice, the chance of two heads may initially be 1/4;
after the first toss has come up heads, it is 1/2. So when your
beliefs should match the assumed chance, it can only match the chance
you assume to obtain at some particular time. At what time?
First, a quick reminder of history. David Lewis once proposed a principle (the 'Principal Principle') linking rational credence and objective chance. It says (or rather, entails) that your rational credence in
any proposition A, on the assumption that the objective chance of A is x, should also be x, no matter what (further) evidence E you have:
OP: P(A | ch(A)=x & E) = x.
This principle, the 'Old Principle', is widely taken to suffer from two defects. First,
suppose your evidence E includes ~A. Then probability theory
ensures that P(A | ch(A)=x & E) = 0, irrespective of x. Lewis
responded by restricting OP to cases where E is 'admissible'. He suggested that a
(true) proposition is admissible iff it is entailed by the history of the world up to now
together with the laws of nature.
To my surprise, there are quite a few people here at ANU who believe that probabilities of various kinds can be modeled in terms of relative size of propositions: something has probability 1 if it is true in all (or 100%) of the relevant worlds, probability 0 if it is true in none (or 0%), and probability 0.5 if it is true in half of the worlds (or 50%). I also find it surprisingly hard to explain why I think that's wrong. Here are two arguments I've come up with so far (apart from obvious worries about making sense of these fractions in infinite and proper-class cases).
Let's say that something X is nomologically possible if it is true at some world where the actual laws of nature are true. The actual laws may or may not be laws at this world. All we require is that they are true there.
Now consider a chancy law according to which a coin tossed in some standard way has a 50 percent chance of landing heads. For this to be a law at some world w means that it is part of the best theory of w, or that it represents the actual propensities in w, or something like that. What does it mean for it to be merely true at a world?
Lewis argues that any theory of chance must explain the Principal Principle, which says that if you know that the objective chance for a certain proposition is x, then you should give that proposition a credence close to x. Anyone who proposes to reduce chance to some feature X, say primitive propensities, must explain why knowledge of X constrains rational expectations in this particular way.
How does Lewis's own theory explain that?
On Lewis's theory, the chance of an event (or proposition) is the
probability-value assigned to the event by the best theory. Those
'probability-values' are just numerical values: they are not
hypothetical values for some fundamental property; they need not even
deserve the name "probability". However, one requirement for good
theories is that they assign high probability-values to true
propositions. Other requirements for good theories are simplicity and strength. The best theory is the one that strikes the best compromise between all three requirements. So the question becomes: why should information that the best theory assigns probability-value x to a proposition constrain rational expectations in the way the Principal Principle says?