## Long-run arguments for maximizing expected utility

Why maximize expected utility? One supporting consideration that is occasionally mentioned (although rarely spelled out or properly discussed) is that maximizing expected utility tends to produce desirable results in the long run. More specifically, the claim is something like this:

(*) If you always maximize expected utility, then over time you're likely to maximize actual utility.

Since "utility" is (by definition) something you'd rather have more of than less, (*) does look like a decent consideration in favour of maximizing expected utility. But is (*) true?

Not in full generality. A well-known counterexample is known as "gambler's ruin". Suppose your utility is measured in pounds sterling. Initially you have £1. Now a fair coin is tossed over and over. On each toss, you have the opportunity to bet your total wealth. If the coin lands heads, you get back three times what you bet. If the coin lands tails, you lose everything. As an expected utility maximizer, you would accept the bet each time. You are then practically certain to end up with £0 over time. So maximizing expected utility does not make it likely that in the long run you'll have a lot of actual utility.

So the long-run argument must be a little more complicated. Perhaps (*) holds in a lot of normal cases. Then we could argue that in those cases, one should maximize expected utility. And perhaps we could cover the "non-normal" cases by arguing that the same principle should be used for all cases.

So under what conditions is (*) true?

The only answer I've come across in conversation and in the literature refers to repeated decision problems and the Laws of Large Numbers. (This is one of two arguments for the expected utility norm discussed by Ray Briggs in their Stanford Encyclopedia article on the norm.) The argument is simple.

Suppose you face the very same decision problem again and again, with the same options, same outcomes, same probabilities, and same utilities. Focus on a particular option, and assume it is chosen over and over. The Law of Large Numbers implies that the relative frequency of every possible outcome is likely to converge to the probability of that outcome. Consequently, the expected utility of the option is likely to converge to the average actual utility of the option. Which is just what (*) says.

As Ray points out, the argument is not very convincing, because the conditions for (*) are so unusual. In real life, we practically never face the very same decision problem again and again.

In addition, the Laws of Large Number only tell us what happens in the limit. So the argument does not actually favour expected utility maximization over, say, the alternative strategy of minimizing expected utility in the first 10^100 decisions and thereafter maximizing expected utility. In the infinite limit, this strategy converges to the same average utility as maximizing expected utility.

But these problems can be fixed. Let's start with the easier one, the second.

Take any option X in the repeated decision problem, and let O be one of the outcomes it might produce. Let p be the probability of O (given X) in a single trial. The number of times that O comes about in n trials then has a Binomial distribution with mean np and variance np(1-p). As n gets larger, the relative frequency of O among all trials is therefore likely to be close to the probability p – and not just in the infinite limit. For example, with p=0.5 and n=100, the probability that the relative frequency lies between 0.4 and 0.6 is 0.97. So, for any possible outcome of X, the relative frequency of that outcome is likely to quickly approach its probability. And so the average utility of X is likely to quickly approach its expected utility.

Now let's see if we can drop the assumption that the same decision problem is faced again and again. With the help of some probability theory, this turns out to be relatively easy, once the question is expressed in the right way.

Suppose an agent faces n decision problems in a row; the problems need not be identical. Let a strategy be a function that selects one option in each problem. Hold fixed some such strategy S. Let U_i be a random variable that maps each state in the i-th decision problem to the utility of following strategy S in that problem. Let T = \sum_i U_i. So T is a random variable for the total (actual) utility gained by following strategy S across all n problems. We want to compare T with the sum of the expected utilities of following S in all n problems. Notice that the expected utility of following S in problem i is simply the mean of U_i. So what we need to show is that

(**) As n gets large, the sum T of n random variables U_i is likely to (quickly) approach the sum of the mean of these variables.

We can't prove (**), because it is not generally true. But it is true in a wide range of cases. In particular, suppose the states in the different decision problems are probabilistically independent. Then elementary probability theory already implies that the mean of T equals the sum of the mean of the U_i, and the variance of T equals the sum of the variances of the U_i, assuming these means and variances exist. If the U_i distributions satisfy certain further assumptions (such as Lindeberg's condition), then a generalised form of the Central Limit Theorem reveals that T will in fact approach a Gaussian distribution with that mean and variance. And the Berry-Esseen Theorem reveals that under certain assumptions, the approximation happens quickly.

So under the assumptions just mentioned, over time the total utility gained by following any given strategy S is indeed likely to (quickly) approach the sum of the expected utility of the option selected by S in the individual problems. In other words, you're likely to maximize total actual utility by maximizing expected utility in each decision problem.

We've still made some fairly strong assumptions. In particular:

(1) I've assumed that which n decision problems are faced does not depend on the choices made in earlier decision problems. This is not the case in the "gambler's ruin" scenario. It seems plausibly to me that the assumption could be weakened, but I'm not sure how.

(2) I've assumed that the agent has a probability over the joint space comprising the states of all individual decision problems, and that the states in different problems are probabilistically independent. In real life, one might have thought that our probabilities usually change between decision problems, and that the states aren't always independent. Again, I think these assumptions can plausibly be justified and/or relaxed. For example, the relevant joint probability doesn't have to be the agent's initial probability before the first problem; we could take the probabilities over the states in the second problem to be given by the agent's probabilities over these states after the first problem has been resolved. This needs to be spelled out more carefully though. There are also variants of the Central Limit Theorem that don't assume full independence.

We also needed to assume that the individual decision problems satisfy certain further constraints such as Lindeberg's condition. Concretely, this means that we have to rule out (for example) that the utilities in one decision problem are vastly greater than the utilities in all others; otherwise the actual total utility will be almost entirely determined by the choice in this single problem. That doesn't seem too unrealistic.

Anyway, I feel I'm reinventing the wheel. Surely all this has been noticed before. Strangely, I can't find any discussion of it anywhere. The only version of the long-run argument that I've seen in the literature is the silly one involving infinite repetitions of the very same decision problem.

Oops, I initially got both Ray Brigg's name and gender wrong. Sorry. Corrected.