## Long-run arguments for maximizing expected utility

Why maximize expected utility? One supporting consideration that is occasionally mentioned (although rarely spelled out or properly discussed) is that maximizing expected utility tends to produce desirable results in the long run. More specifically, the claim is something like this:

(*) If you always maximizeexpectedutility, then over time you're likely to maximizeactualutility.

Since "utility" is (by definition) something you'd rather have more of than less, (*) does look like a decent consideration in favour of maximizing expected utility. But is (*) true?

Not in full generality. A well-known counterexample is known as "gambler's ruin". Suppose your utility is measured in pounds sterling. Initially you have £1. Now a fair coin is tossed over and over. On each toss, you have the opportunity to bet your total wealth. If the coin lands heads, you get back three times what you bet. If the coin lands tails, you lose everything. As an expected utility maximizer, you would accept the bet each time. You are then practically certain to end up with £0 over time. So maximizing expected utility does not make it likely that in the long run you'll have a lot of actual utility.

So the long-run argument must be a little more complicated. Perhaps
(*) holds in a lot of normal cases. Then we could argue that in
*those* cases, one should maximize expected utility. And perhaps
we could cover the "non-normal" cases by arguing that the same
principle should be used for all cases.

So under what conditions is (*) true?

The only answer I've come across in conversation and in the literature refers to repeated decision problems and the Laws of Large Numbers. (This is one of two arguments for the expected utility norm discussed by Ray Briggs in their Stanford Encyclopedia article on the norm.) The argument is simple.

Suppose you face the very same decision problem again and again, with the same options, same outcomes, same probabilities, and same utilities. Focus on a particular option, and assume it is chosen over and over. The Law of Large Numbers implies that the relative frequency of every possible outcome is likely to converge to the probability of that outcome. Consequently, the expected utility of the option is likely to converge to the average actual utility of the option. Which is just what (*) says.

As Ray points out, the argument is not very convincing, because the conditions for (*) are so unusual. In real life, we practically never face the very same decision problem again and again.

In addition, the Laws of Large Number only tell us what happens *in
the limit*. So the argument does not actually favour expected
utility maximization over, say, the alternative strategy of
*minimizing* expected utility in the first 10^100 decisions and
thereafter maximizing expected utility. In the infinite limit, this
strategy converges to the same average utility as maximizing expected
utility.

But these problems can be fixed. Let's start with the easier one, the second.

Take any option X in the repeated decision problem, and let O be one
of the outcomes it might produce. Let p be the probability of O (given
X) in a single trial. The number of times that O comes about in n
trials then has a Binomial distribution with mean np and variance
np(1-p). As n gets larger, the relative frequency of O among all
trials is therefore likely to be close to the probability p –
and not just in the infinite limit. For example, with p=0.5 and n=100,
the probability that the relative frequency lies between 0.4 and 0.6
is 0.97. So, for any possible outcome of X, the relative frequency of
that outcome is likely to quickly approach its probability. And so the
average utility of X is likely to *quickly* approach its expected
utility.

Now let's see if we can drop the assumption that the same decision problem is faced again and again. With the help of some probability theory, this turns out to be relatively easy, once the question is expressed in the right way.

Suppose an agent faces n decision problems in a row; the problems need
not be identical. Let a *strategy* be a function that selects one
option in each problem. Hold fixed some such strategy S. Let U_i be a
random variable that maps each state in the i-th decision problem to
the utility of following strategy S in that problem. Let T = \sum_i
U_i. So T is a random variable for the total (actual) utility gained
by following strategy S across all n problems. We want to compare T
with the sum of the *expected* utilities of following S in all n
problems. Notice that the expected utility of following S in problem i
is simply the mean of U_i. So what we need to show is that

(**) As n gets large, the sum T of n random variables U_i is likely to (quickly) approach the sum of the mean of these variables.

We can't prove (**), because it is not generally true. But it
*is* true in a wide range of cases. In particular, suppose the states in the different decision problems are probabilistically independent. Then
elementary probability theory already implies that the mean of T
equals the sum of the mean of the U_i, and the variance of T equals
the sum of the variances of the U_i, assuming these means and
variances exist. If the U_i distributions satisfy certain further
assumptions (such as Lindeberg's
condition), then a generalised form of the Central
Limit Theorem reveals that T will in fact approach a Gaussian
distribution with that mean and variance. And the Berry-Esseen
Theorem reveals that under certain assumptions, the approximation
happens quickly.

So under the assumptions just mentioned, over time the total utility gained by following any given strategy S is indeed likely to (quickly) approach the sum of the expected utility of the option selected by S in the individual problems. In other words, you're likely to maximize total actual utility by maximizing expected utility in each decision problem.

We've still made some fairly strong assumptions. In particular:

(1) I've assumed that which n decision problems are faced does not depend on the choices made in earlier decision problems. This is not the case in the "gambler's ruin" scenario. It seems plausibly to me that the assumption could be weakened, but I'm not sure how.

(2) I've assumed that the agent has a probability over the joint space comprising the states of all individual decision problems, and that the states in different problems are probabilistically independent. In real life, one might have thought that our probabilities usually change between decision problems, and that the states aren't always independent. Again, I think these assumptions can plausibly be justified and/or relaxed. For example, the relevant joint probability doesn't have to be the agent's initial probability before the first problem; we could take the probabilities over the states in the second problem to be given by the agent's probabilities over these states after the first problem has been resolved. This needs to be spelled out more carefully though. There are also variants of the Central Limit Theorem that don't assume full independence.

We also needed to assume that the individual decision problems satisfy certain further constraints such as Lindeberg's condition. Concretely, this means that we have to rule out (for example) that the utilities in one decision problem are vastly greater than the utilities in all others; otherwise the actual total utility will be almost entirely determined by the choice in this single problem. That doesn't seem too unrealistic.

Anyway, I feel I'm reinventing the wheel. Surely all this has been noticed before. Strangely, I can't find any discussion of it anywhere. The only version of the long-run argument that I've seen in the literature is the silly one involving infinite repetitions of the very same decision problem.

Oops, I initially got both Ray Brigg's name and gender wrong. Sorry. Corrected.