Posts on: Bayesianism
The standard dynamic norm of Bayesianism,
conditionalization, is clearly inadequate if credences are
defined over self-locating propositions. How should it be adjusted?
This question was popular at around 2005-2015. Chris Meacham and I
came up with the same answer, which we published in (Meacham 2010),
(Schwarz
2012), and (Schwarz 2015). I showed that the
replacement norm that we proposed has all the traditional virtues of
conditionalization. For example, (under the usual idealized conditions)
following the norm uniquely maximizes expected accuracy, and an agent is
invulnerable to diachronic Dutch books iff they follow the norm.
Bruno de Finetti (de Finetti (1970)) suggested that chance is objectified credence. The suggestion is explained and defended in Jeffrey (1983, ch.12), Skyrms (1980 ch.I), Skyrms (1984, ch.3), and Diaconis and Skyrms (2017, ch.7), but I still find it hard to understand. It seems to assume that rational credence functions are symmetrical in a way in which I think they shouldn't be.
The Best-Systems Account of chance promises to explain why beliefs about chance should affect our beliefs about ordinary events, as formalized by the Principal Principle. In this post, I want to discuss a challenge to any such explanation.
First, some background.
For any candidate chance function f, let [f] be the set of worlds of which f is (part of) the best system. According to the Best-Systems Account (BSA), the hypothesis "Ch=f" that f is the true chance function expresses the proposition [f]. In what follows, I'll assume that a world is simply a history of "outcomes", and that the candidate systems can be compressed into a single (possibly parameterized) chance function.
In 2009, at the ANU, Mike Titelbaum organized a small workshop on the Sleeping Beauty problem. I gave a talk in which I argued that the answer to the problem depends on whether we accept genuinely diachronic norms on rational belief: if yes, halfing is the most plausible answer; if no, we get thirding. A successor of this talk is now forthcoming in Noûs. Here's a PDF. In this post, I want to discuss a surprisingly hard question Kenny Easwaran raised in the Q&A after my talk:
How confident should Beauty be on Wednesday that the coin has landed heads?
I occasionally teach the doomsday argument in my philosophy classes, with the hope of raising some general questions about self-locating priors. Unfortunately, the usual formulations of the argument are problematic in so many ways that it's hard to get to these questions.
Let's look at Nick Bostrom's version of the argument, as presented for example in Bostrom (2008).
Often there are many reasons for and against a certain act or belief. How do these reasons combine to an overall reason? Nair (2021) tries to give an answer.
Nair's starting point is a little more specific. Nair intuits that there are cases in which two equally strong reasons combine to a reason that is twice as strong as the individual reasons. In other cases, however, the combined reason is just as strong as the individual reasons, or even weaker.
To make sense this, we need to explain (1) how strengths of reason can be represented numerically, and (2) under what conditions the strengths of different reasons add up.
Isaacs and Russell (2023) proposes a new way of thinking about evidence and updating.
The standard Bayesian picture of updating assumes that an agent has some ("prior") credence function Cr and then receive some (total) new evidence E. The agent then needs to update Cr in light of E, perhaps by conditionalizing on E. There is no room, in this picture, for doubts about E. The evidence is taken on board with absolute certainty.
The standard picture thereby assumes that the agent's cognitive system is perfectly sensitive to a certain aspect of the world: if E is true, the agent is certain to update on E; if E is false, the agent is certain to not update on E.
Some people – important people, like Richard Jeffrey or Brian Skyrms – seem to believe that Laplace and de Finetti have solved the problem of induction, assuming nothing more than probabilism. I don't think that's true.
I'll try to explain what the alleged solution is, and why I'm not convinced. I'll pick Skyrms as my adversary, mainly because I've just read Skyrms and Diaconis's Ten Great Ideas about Chance, in which Skyrms presents the alleged solution in a somewhat accessible form.
Greaves (2013) describes a case in which adopting a single false belief would (supposedly) be rewarded by many true beliefs.
Emily is taking a walk through the Garden of Epistemic Imps. A child plays on the grass in front of her. In a nearby summerhouse are n further children, each of whom may or may not come out to play in a minute. They are able to read Emily's mind, and their algorithm for deciding whether to play outdoors is as follows. If she forms degree of belief 0 that there is now a child before her, they will come out to play. If she forms degree of belief 1 that there is a child before her, they will roll a fair die, and come out to play iff the outcome is an even number. […]
There are two paths to Shangri La. One goes by the sea, the other by the mountains. You are on the mountain path and about to enter Shangri La. You can choose how your belief state will change as you enter through the gate, in response to whatever evidence you may receive. At the moment, you are (rationally) confident that you have travelled by the mountains. You know that you will not receive any surprising new evidence as you step through the gate. You want to maximize the expected accuracy of your future belief state – at least with respect to the path you took. How should you plan to change your credence in the hypothesis that you have travelled by the mountains?
I've read around a bit in the literature on higher-order evidence. Two different ideas seem to go with this label. One concerns the possibility of inadequately responding to one's evidence. The other concerns the possibility of having imperfect information about one's evidence. I have a similar reaction to both issues. I haven't seen it in the papers I've looked at. Pointers very welcome.
I'll begin with the first issue.
Let's assume that a rational agent proportions her beliefs to her evidence. This can be hard. For example, it's often hard to properly evaluate statistical data. Suppose you have evaluated the data, reached the correct conclusion, but now receive misleading evidence that you've made a mistake. How should you react?
Some (e.g. Christensen (2010)) say you should reduce your confidence in the conclusion you've reached. Others (e.g. Tal (2021)) say you should remain steadfast and not reduce your confidence.
If a certain hypothesis entails that N percent of all observers in the universe have a certain property, how likely is it that we have that property – conditional on the hypothesis, and assuming we have no other relevant information?
Answer: It depends on what else the hypothesis says. If, for example, the hypothesis says that 90 percent of all observers have three eyes, and also that we ourselves have two eyes, then the probability that we have three eyes conditional on the hypothesis is zero.
This effect is easy to miss because many hypotheses that appear to be just about the universe as a whole secretly contain special information about us. Consider the following passage from Carroll (2010), cited in Arntzenius and Dorr (2017):
In the previous post I argued that rational priors must favour some possibilities over others, and that this is a problem for Richard Pettigrew's model of Jamesian permissivism. It also points towards an alternative model that might be worth exploring.
I claim that, in the absence of unusual evidence, a rational agent should be confident that observed patterns continue in the unobserved part of the world, that witnesses tell the truth, that rain experiences indicate rain, and so on. In short, they should give low credence to various skeptical scenarios. How low? Arguably, our epistemic norms don't fix a unique and precise answer.
Pettigrew (2021) defends a type of permissivism about rational credence inspired by James (1897), on which different rational priors reflect different attitudes towards epistemic risk. I'll summarise the main ideas and raise some worries.
(There is, of course, much more in the book than what I will summarise, including many interesting technical results and some insightful responses to anti-permissivist arguments.)
Last week I gave a talk in which I claimed (as an aside) that if you update your credences by conditionalising on a true proposition then your credences never become more inaccurate. That seemed obviously true to me. Today I tried to quickly prove it. I couldn't. Instead I found that the claim is false, at least on popular measures of accuracy.
The problem is that conditionalising on a true proposition typically increases the probability of true propositions as well as false propositions. If we measure the inaccuracy of a credence function by adding up an inaccuracy score for each proposition, the net effect is sensitive to how exactly that score is computed.
In this
2018 paper, J. Dmitri Gallow shows that it is difficult to combine
multiple deference principles. The argument is a little complicated,
but the basic idea is surprisingly simple.
Suppose A and B are two weather forecasters. Let r be the
proposition that it will rain tomorrow, let A=x be the proposition
that A assigns probability x to r; similarly for B=x. Here are two
deference principles you might like to follow:
Dutch Book arguments are often used to justify various epistemic
norms – in particular, that credences should obey the
probability axioms and that they should evolve by
condionalization. Roughly speaking, the argument is that if someone
were to violate these norms, then they would be prepared to accept
bets which amount to a guaranteed loss, and that seems
irrational.
But it's hard to spell out how exactly the argument is meant to go. In
fact, I'm not aware of any satisfactory statement. Here's my
attempt.
My paper "Imaginary
Foundations" has been accepted at Ergo (after rejections from
Phil Review, Mind, Phil Studies, PPR, Nous, AJP, and Phil
Imprint). The paper has been in the making since 2005, and I'm quite
fond of it.
The question I address is simple: how should we model the impact of
perceptual experience on rational belief? That is, consider a
particular type of experience – individuated either by its
phenomenology (what it's like to have the experience) or by its
physical features (excitation of receptor cells, or whatever). How
should an agent's beliefs change in response to this type of
experience?
According to the Principle of Indifference, alternative
propositions that are similar in a certain respect should be given
equal prior probability. The tricky part is to explain what should
count as similarity here.
Van Fraassen's cube factory nicely illustrates the problem. A
factory produces cubes with side lengths between 0 and 2 cm, and
consequently with volumes between 0 and 8 cm^3. Given this
information, what is the probability that the next cube that will be
produced has a side length between 0 and 1 cm? Is it 1/2, because the
interval from 0 to 1 is half of the interval from 0 to 2? Or is it
1/8, because a side length of 1 cm means a volume of 1 cm^3, which is
1/8 of the range from 0 to 8?
You observe a process that generates two kinds of outcomes, 'heads'
and 'tails'. The outcomes appear in seemingly random order, with
roughly the same amount of heads as tails. These observations support
a probabilistic model of the process, according to which the
probability of heads and of tails on each trial is 1/2, independently
of the other outcomes.
How observations about frequencies confirm or disconfirm
probabilistic models is well understood in Bayesian epistemology. The
central assumption that does most of the work is the Principal
Principle, which states that if a model assigns (objective)
probability x to some outcomes, then conditional on the model, the
outcomes have (subjective) probability x. It follows that models that
assign higher probability to the observed outcomes receive a greater
boost of subjective probability than models that assign lower
probability to the outcomes.
Imagine you and I are walking down a long path. You are ahead,
but we can communicate on the phone. If you say, "there are strawberries here" and I trust you, I should not come to believe that there
are strawberries where I am, but that there are strawberries wherever
you are. If I also know that you are 2 km ahead, I should come to
believe that there are strawberries 2 km down the path. But what's the
general rule for deferring to somebody with self-locating beliefs?
What makes the Sleeping Beauty problem non-trivial is Beauty's
potential memory loss on Monday night. In my view, this means that
Sleeping Beauty should be modeled as a case of potential epistemic
fission: if the coin lands tails, any update Beauty makes to her
beliefs in the transition from Sunday to Monday will also fix her
beliefs on Tuesday, and so the Sunday state effectively has two
epistemic successors, one on Monday one on Tuesday. All accounts of
epistemic fission that I'm aware of then entail halfing.
There has been a lively debate in recent years about the
relationship between graded belief and ungraded belief. The debate
presupposes something we should regard with suspicion: that there is
such a thing as ungraded belief.
Compare earthquakes. I'm not an expert on earthquakes, but I know
that they vary in strength. How exactly to measure an earthquake's
strength is to some extent a matter of convention: we could have used
a non-logarithmic scale; we could have counted duration as an aspect
of strength, and so on. So when we say that an earthquake has
magnitude 6.4, we characterize a central aspect of an earthquake's
strength by locating it on a conventional scale.
The following principles have something in common.
Conditional Coordination Principle.
A rational person's credence in a conditional A->B should equal the
ratio of her credence in the corresponding propositions B and A&B;
that is, Cr(A->B) = Cr(B/A) = Cr(B)/Cr(A&B).
Normative Coordination Principle.
On the supposition that A is what should be done, a rational agent
should be motivated to do A; that is, very roughly, Des(A/Ought(A))
> 0.5.
Probability Coordination Principle.
On the supposition that the chance of A is x, a rational agent
should assign credence x to A; that is, roughly, Cr(A/Ch(A)=x) = x.
Nomic Coordination Principle.
On the supposition that it is a law of nature that A, a rational agent
should assign credence 1 to A; that is, Cr(A/L(A)) = 1.
All these principles claim that an agent's attitudes towards a certain
kind of proposition rationally constrain their attitudes towards other
propositions.
In discussions of the raven paradox,
it is generally assumed that the (relevant) information gathered from an
observation of a black raven can be regimented into a statement of the
form Ra & Ba ('a is a raven and a is
black'). This is in line with what a lot of "anti-individualist" or
"externalist" philosophers say about the information we acquire
through experience: when we see a black raven, they claim, what we
learn is not a descriptive or general proposition to the effect that
whatever object satisfies such-and-such conditions is a black raven,
but rather a "singular" proposition about a particular object --
we learn that this very object is black and a raven. It seems
to me that this singularist doctrine makes it hard to account for many
aspects of confirmation.
It is widely agreed that conditionalization is not an adequate norm
for the dynamics of self-locating beliefs. There is no agreement on
what the right norms should look like. Many hold that there are no
dynamic norms on self-locating beliefs at all. On that view, an
agent's self-locating beliefs at any time are determined on the basis
of the agent's evidence at that time, irrespective of the earlier
self-locating belief. I want to talk about an alternative approach
that assumes a non-trivial dynamics for self-locating beliefs. The
rough idea is that as time goes by, a belief that it is Sunday should
somehow turn into a belief that it is Monday.
Next, undermining. Suppose we are testing a model H according to
which the probability that a certain type of coin toss results in
heads is 1/2. On some accounts of physical probability, including
frequency accounts and "best system" accounts, the truth of H is
incompatible with the hypothesis that all tosses of the relevant type
in fact result in heads. So we get a counterexample to simple
formulations of the Principal Principle: on the assumption that H is
true, we know that the outcomes can't be all-heads, even though H
assigns positive probability to all-heads. In such a case, we say that
all-heads is undermining for H.
Suppose we are testing statistical models of some physical process
-- a certain type of coin toss, say. One of the models in question
holds that the probability of heads on each toss is 1/2; another holds
that the probability is 1/4. We set up a long run of trials and
observe about 50 percent heads. One would hope that this confirms the
model according to which the probability of heads is 1/2 over the
alternative.
(Subjective) Bayesian confirmation theory says that some evidence E
supports some hypothesis H for some agent to the extent that the
agent's rational credence C in the hypothesis is increased by the
evidence, so that C(H/E) > C(H). We can now verify that observation of
500 heads strongly confirms that the coin is fair, as follows.
Fred has bought a duplication machine at a discount from a series
in which 50 percent of all machines are broken. If Fred's machine
works, it will turn Fred into two identical copies of himself, one
emerging on the left, the other on the right. If Fred's machine is
broken, he will emerge unchanged and unduplicated either on the left
or on the right, but he can't predict where. Fred enters his machine,
briefly loses consciousness and then finds himself emerge on the
left. In fact, his machine is broken and no duplication event has
occurred, but Fred's experiences do not reveal this to him.
Given some evidence E and some proposition P, we can ask to what
extent E supports P, and thus to what extent an agent should believe P
if their only relevant evidence is E. The question may not always have
a precise answer, but there are both intuitive and theoretical reasons
to assume that the question is meaningful – that there is a kind
of (imprecise) "evidential probability" conferred by evidence on
propositions. That's why it makes sense to say, for example, that one
should proportion one's beliefs to one's evidence.
There's an exciting new theory in cognitive science. The theory began
as an account of message-passing in the visual cortex, but it quickly
expanded into a unified explanation of perception, action, attention,
learning, homeostasis, and the very possibility of life. In its most
general and ambitious form, the theory was mainly developed by Karl
Friston -- see
e.g. Friston
2006, Friston
and Stephan 2007,
Friston
2009,
Friston
2010,
or the
Wikipedia page on the free-energy principle.
Imagine the universe has a centre that regularly produces new stars
which then drift away at a constant speed. This has been going on
forever, so there are infinitely many stars. We can label them by age,
or equivalently by their distance from the centre: star 1 is the
youngest, then comes star 2, then star 3, and so on, without end. The
stars in turn produce planets at regular intervals. So the older a
star, the more planets surround it. Today, something happened to one
(and only one) of the planets. Let's say it exploded. Given all this,
what is your credence that the unfortunate planet belonged to the
first 100 stars? What about the second 100? It would be odd to think
that the event is more likely to have happened at one of the first 100
stars than at one of the next 100, since the latter have far
more planets. Similarly if we compare the first 1000 stars with the
next 1000, or the first million with the next million, and so on. But
there is no countably additive (real-valued) probability measure that
satisfies this constraint.
Two initially plausible claims:
- Sometimes, a possible chance function conditionalized on a proposition A yields another possible chance function.
- Any rational prior credence function Cr conditional on the hypothesis Ch=f
that f is the (actual, present) chance function should coincide with
f; i.e., Cr(A / Ch=f) = f(A) for all A (provided that Cr(Ch=f)>0).
Claim 1 is a supported by the popular idea that chances evolve by
conditionalizing on history, so that the chance at time t2 equals the
chance at t1 conditional on the history of events between t1 and
t2. Claim 2 is a weak form of the Principal Principle and often taken
to be a defining feature of chance.
You can't predict the stock market by looking at tea leaves. If an
episode of looking at tea leaves makes you believe that the stock
market will soon collapse, then -- assuming your previous beliefs did
not support the collapse hypothesis, nor the hypothesis that tea
leaves predict the stock market -- your new belief is unjustified and
irrational. So there are epistemic norms for how one's opinions may
change through perceptual experience.
Such norms are easily accounted for in the traditional Bayesian
picture where each perceptual experience is associated with an
evidence proposition E on which any rational agent should condition
when they have the experience. But what if perceptual experiences
don't confer absolute certainty on anything? Jeffrey pointed out that
if there is a partition of propositions { E_i } = E_1,...,E_n such
that (1) an experience changes their probabilities to some values {
p_i } = p_1,...,p_n, and (2) the experience does not affect the
probabilities conditional on any member of the partition, then the new
probability assigned to any proposition A is the weighted average of
the old probability conditional on the members of the partition,
weighted by the new probability of that partition. This rule is often
called "Jeffrey conditioning" and sometimes "generalised
conditioning", but unlike standard conditioning it isn't a dynamical
rule at all: it is a simple consequence of the probability
calculus. To get genuine epistemic norms on the dynamics of belief
through perceptual experience, Jeffrey's rule must be supplemented
with a story about how a given experience, perhaps together with an
agent's previous belief state, may fix the partition { E_i } and
values { p_i } that determine a Jeffrey update. This is the "input
problem" for Jeffrey conditioning.
Suppose a rational agent makes an observation, which changes the
subjective probability she assigns to a hypothesis H. In this case,
the new probability of H is usually sensitive to both the observation
and the prior probability. Can we factor our the prior probability to
get a measure of how the experience bears on the probability of H,
independently of the prior probability?
A common answer, going back to Alan Turing and I.J.Good, is to use
Bayes factors. The Bayes factor B(H) for H is the ratio
(P'(H)/P'(not-H))/(P(H)/P(not-H)) of new odds on H to old odds. Thus
the new odds on H are the old odds multiplied by the Bayes factor. For
example, if the prior credence in H was 0.25 and the posterior is 0.5,
then the odds on H changed from 1:3 to 1:1, and so the Bayes factor of
the update is 3. The same Bayes factor would characterise an update
from probability 0.01 to about 0.03 (odds 1:99 to 1:33) or from 0.9 to
about 0.96 (odds 9:1 to 27:1).
Luc Bovens and Wlodek Rabinowicz (2010
and 2011)
present the following puzzle:
Three people are each given a hat to put on in the
dark. The hats' colours, either black or white, has been decided by
three independent tosses of a fair coin. Then the light goes on and
everyone can see the hats of the two others, but not their own. All of
this is common knowledge in the group.
Let's call the three players X, Y and Z. There are eight possible
distributions of hat colours, each with probability 1/8:
If beliefs are modeled by a probability distribution over centered
worlds, belief update cannot work simply by conditionalisation. How
then does it work? The most popular answer in philosophy goes as
follows.
Let P an agent's credence function at time t1, P' the credence function
at t2, and E the evidence received at t2. Since E is a centered
proposition, it can be true at multiple points within a world.
Suppose, however, that the agent assigns probability 0 to worlds at
which E is true more than once. Then to compute P', first
conditionalise P on the uncentered fragment of E -- i.e. the strongest
uncentered proposition entailed by E. This rules out all worlds at
which E is true nowhere. Second, move the center of each remaining
world to the (unique) point at which E is true.
Alice is randomly selected from her population to be tested for a
rare genetic disorder that affects about one in 10,000 people. The
test is accurate 99 percent of the time, both among subjects that have
the disorder and among subjects that don't. Alice's test comes back
positive.
Call the information in the previous paragraph E, and suppose it's
all you know about the situation. How confident are you that Alice has
the disorder?
Letting our subjective probabilities be guided by the stated
frequencies, we can use Bayes' Theorem to figure out that P(disorder |
positive) = P(positive | disorder) * P(disorder) / (P(positive |
disorder) * P(disorder) + P(positive | ~disorder) * P(~disorder)) =
0.99 * 0.0001 / (0.99 * 0.0001 + 0.01 * 0.9999) = 0.0098. Assume then
that your degree of belief is about 0.01.
Expressions like 'P(A/B)', or 'the probability of A given B', seem
to be used in various different ways. On one usage, P(A/B) equals
P(AB)/P(B), at least if P(B) > 0. Call this the ratio
usage. Simple versions of the ratio usage define P(A/B) as
P(AB)/P(B), and so entail that P(A/B) is undefined whenever
P(B)=0. But I would like to admit views into the family on which
P(A/B) is taken as a primitive binary probability, governed by
something like the Popper-Renyi conditions.
This paper (recently
featured on the
physics arXiv blog) argues that if the universe never comes to an
end, then the universe will probably come to an end within the next 5 billion
years. The reasoning, as far as I can tell, goes roughly like
this.
First, define the probability of an event of type A given an event
of type B as the total number of A events over the number of B
events. If the universe is infinite, then the total number of A events
and B events will often be infinite. But infinity over infinity isn't
well-defined. So to have well-defined probabilities, the relevant
counts of A and B events must be restricted, e.g. to a finite initial
segment of the universe.
Compare the following two ways of responding to the weather report's
"probability of rain" announcement.
Good: Upon hearing that the probability of rain is x,
you come to believe to degree x that it will rain.
Bad: Upon hearing that the probability of rain is x, you
become certain that it will rain if x > 0.5, otherwise certain that
it won't rain.
The Bad process seems bad, not just because it may lead to bad
decisions. It seems epistemically bad to respond to a "70%
probability of rain" announcement by becoming absolutely certain that
it will rain. The resulting attitude would be unjustified and irrational.
First, a quick reminder of history. David Lewis once proposed a principle (the 'Principal Principle') linking rational credence and objective chance. It says (or rather, entails) that your rational credence in
any proposition A, on the assumption that the objective chance of A is x, should also be x, no matter what (further) evidence E you have:
OP: P(A | ch(A)=x & E) = x.
This principle, the 'Old Principle', is widely taken to suffer from two defects. First,
suppose your evidence E includes ~A. Then probability theory
ensures that P(A | ch(A)=x & E) = 0, irrespective of x. Lewis
responded by restricting OP to cases where E is 'admissible'. He suggested that a
(true) proposition is admissible iff it is entailed by the history of the world up to now
together with the laws of nature.
In the last entry, I have suggested that
EEP) P_2(A) = P_1(+A|+E)
is a sensible rule for updating self-locating beliefs. Here, E is the
total evidence received at time 2 (the time of P_2), and '+' denotes a
function that shifts the evaluation index of propositions, much like
'in 5 minutes': '+A' is true at a centered world w iff A is true at
the next point from w where new information is received. (EEP) therefore
says that upon learning E, your new credence in any proposition A
should equal your previous conditional credence that A will obtain at the next
point when information comes in, given that this
information is E.
I've been participating in a couple of workshops here at ANU lately,
and I thought I'd share some notes. First, we had a little Sleeping Beauty workshop where Terry Horgan
and Mike Titlebaum defended thirding, and me halfing. Unfortunately, I
think we didn't quite get to the heart of our disagreement. Each of us
said their own thing, without saying enough about what's wrong with
the reasoning of the other sides. So I'll do that here. I start with
Terry's account.
We Bayesians are sometimes bugged about ultimate priors: what
probability function would suit a rational agent before the
incorporation of any evidence? The question matters not because anyone
cares about what someone should believe if they popped into existence
in a state of ideal rationality and complete empirical ignorance. It
matters because the answer also determines what conclusions rational
agents should draw from their evidence at any later point in their
life. Take the total evidence you have had up to now. Given this
evidence, is it more likely that Obama won the 2008 election or that
McCain won it? There are distributions of priors on which your
evidence is a strong indicator that McCain won. Nevertheless, this
doesn't seem like it's a rational conclusion to draw. So there must be
something wrong with those priors.
Darks clouds are gathering. Soon it will be raining. When it does, I will
believe that it is raining. I do not yet believe that it
is raining even though I do believe that my well-informed future self
will believe that it is raining. I thereby violate the 'Principle of
Reflection'. Once we allow for centered propositions that change their truth-value between times
and places, Reflection, like its close cousin Conditioning, become very implausible
norms of rationality.
A curious aspect of the Sleeping Beauty debate is the role of Dutch Books. At first sight, it looks as if Dutch Book considerations support thirding (see e.g. Hitchcock 2004). However, as Halpern 2006 shows, Beauty can also be Dutch Booked if she is a thirder. Some have argued that these arguments might fail because in Sleeping Beauty type cases, credences and betting odds can come apart (see e.g. Bradley and Leitgeb 2006). I disagree. Instead, I will argue that her vulnerability to Dutch Books doesn't show that Beauty is irrational -- at least not if she is a halfer.
Bas van Fraassen's Reflection Principle says that your current beliefs should be in line with your current beliefs about your future beliefs. More precisely,
PRB: P_1(A | P_2(A)=x) = x.
P_1 is your credence at time 1, P_2 your credence at time 2. PRB says that conditional on the assumption that at time 2 you believe A to degree x, you should already believe A to degree x at time 1. For agents who believe that they will (or might) change their beliefs in irrational ways between the two times, PRB is not a reasonable demand: if you know that you will be hit on the head tomorrow and consequently believe that the Earth is flat, you shouldn't believe that the Earth is flat now. On the other hand, if you're certain you will not change your beliefs in any such irrational way between now and tomorrow, then PRB is reasonable: suppose tomorrow you will believe that the Earth is flat by rationally responding to some very surprising new information; then you can infer that there exists some such information strongly supporting that the Earth is flat. But the fact that there is evidence for P is of course itself evidence for P. Hence you should already believe today that the Earth is probably flat.
Suppose beliefs locate us in centered logical space: to believe something is to rule out not only ways a universe might be, but ways things might be for an individual at a time. Then there will be two kinds of rational belief change: we can learn something new about our present situation, and we can change our situation and adjust our beliefs to this change. The rule for changes of the first kind is conditionalization. The rule for changes of the second kind doesn't have an official name yet, as far as I know. (In the AGM/KM framework, it is called "update", but we Bayesians often use "update" for conditioning.) In practice, the two rules always go hand in hand: you never learn something new without changing your situation, and you hardly ever change your situation without learning anything new.
In this paper, I try to spell out the two rules, and their combination: Believing in afterlife: conditionalization in a changing world (PDF).
I'm a bit unhappy with some parts of the story, and I should probably say more about alternative accounts in the literature, and why I don't like them. So hopefully there will be an update soon. In the meantime, comments are as always very welcome!
This is a follow-up to the previous post on Shangri La. As before, the story is that a fair coin decides which path you take to Shangri La: on heads, you travel by the Mountains, on tails, by the Sea. If you arrive at Shangri La via the Sea, the guardians will replace your Sea memories with Mountain memories.
In the other post, I said that if you actually traveled by the Mountains, you should remain confident that you traveled by the Mountains, even though you would have ended up with the same evidence had you traveled by the Sea.
(This is more or less the talk I gave at the "Epistemology at the Beach" workshop last Sunday.)
"A wise man proportions his belief to the evidence", says Hume. But to what evidence? Should you proportion your belief to the evidence you have right now, or does it matter what evidence you had before? Frank Arntzenius ("Some problems for conditionalization and reflection", JoP, 2003) tells a story that illustrates the difference:
...there is an ancient law about entry into Shangri La:
you are only allowed to enter, if, once you have entered, you no
longer know by what path you entered. Together with the guardians you
have devised a plan that satisfies this law. There are two paths to
Shangri La, the Path by the Mountains, and the Path by the Sea. A fair
coin will be tosssed by the guardians to determine which path you
will take: if heads you go by the Mountains, if tails you go by the
Sea. If you go by the Mountains, nothing strange will happen: while
traveling you will see the glorious Mountains, and even after you
enter Shangri La you will for ever retain your memories of that
Magnificent Journey. If you go by the Sea, you will revel in the
Beauty of the Misty Ocean. But just as you enter Shangri La, your
memory of this Beauteous Journey will be erased and replaced by a
memory of the Journey by the Mountains.
A coin is to be tossed. Expert A tells you that it will land heads with probability 0.9; expert B says the probability is 0.1. What should you make of that?
Answer: if you trust expert A to degree a and expert B to degree b and have no other relevant information, your new credence in heads should be a*0.9 + b*0.1. So if you give equal trust to both of them, your credence in heads should be 0.5. You should be neither confident that the coin will land heads, nor that it will land tails. -- Obviously, you shouldn't take the objective chance of heads to be 0.5, contradicting both experts. Your credence of 0.5 is compatible with being certain that the chance is either 0.1 or 0.9. Credences are not opinions about objective chances.
What about this much simpler argument for halfing:
As usual, Sleeping Beauty wakes up on Monday, knowing that she will have an indistinguishable waking experience on Tuesday iff a certain fair coin has landed tails. Thirders say her credence in the coin landing heads should be 1/3; halfer say it should be 1/2.
Now suppose before falling asleep each day, Beauty manages to write down her present credence in heads on a small piece of paper. Since that credence was 1/2 on Sunday evening, she now (on Monday) finds a note saying "1/2".
I've thought a bit about belief update recently. One thing I noticed is that it is often assumed in the literature (usually without argument) that if you know that there are two situations in your world that are evidentially indistinguishable from your current situation, then you should give them roughly the same credence. Although I agree with some of the applications, the principle in general strikes me as very implausible. Here is a somewhat roundabout counter-example that has a few other interesting features as well.
Following up on Weng-Hong (1, 2, 3), here are a few thoughts on thresholds for belief.
If beliefs come in different degrees or strength, what do we mean when we say not that Fred believes that P with strength x, but simply that Fred believes that P? Perhaps we mean that Fred believes that P with sufficient strength, where context may help determining what counts as sufficient. However, on this account, the following principles should be obviously invalid (both descriptively and normatively):
To my surprise, there are quite a few people here at ANU who believe that probabilities of various kinds can be modeled in terms of relative size of propositions: something has probability 1 if it is true in all (or 100%) of the relevant worlds, probability 0 if it is true in none (or 0%), and probability 0.5 if it is true in half of the worlds (or 50%). I also find it surprisingly hard to explain why I think that's wrong. Here are two arguments I've come up with so far (apart from obvious worries about making sense of these fractions in infinite and proper-class cases).
I've been assigned some boring administrative work, but that's finished now, I hope. Here are some rough thoughts on indifference and Adam Elga's Dr. Evil paper (PDF).
There are many possible individuals whose mental state is subjectively indistinguishable from my current mental state insofar as they all share my current phenomenal experiences and my (real or quasi-) memories. Some of them inhabit worlds that are exactly as I believe the actual world is, and are located in that world exactly where I believe I am located in the actual world. Others occupy very different places in very different worlds: they are brains in vats or inhabitants of gruesome counterinductive worlds. How should I distribute my credence among all these possibilities?
Eliezer Yudkowsky, in his Intuitive Explanation of Bayesian Reasoning, argues that it is irrational to justify the belief that if a biological war will break out it won't wipe out humanity by pointing out that one is an optimist:
p(you are currently an optimist | biological war occurs within ten years and wipes out humanity) =
p(you are currently an optimist | biological war occurs within ten years and does not wipe out humanity)
Let P be a proposition of which you neither believe that it's true nor that it's false, say Goldbach's Conjecture. Since you know that you don't believe P (otherwise you couldn't have chosen it), your conditional subjective probability for [P and I don't believe P] given P should be close to 1. However, if you were to learn that P, your subjective probability for [P and I don't believe P] shouldn't be close to 1, but close to 0. So is this a case were you shouldn't conditionalize?