Long-run arguments for maximizing expected utility

Why maximize expected utility? One supporting consideration that is occasionally mentioned (although rarely spelled out or properly discussed) is that maximizing expected utility tends to produce desirable results in the long run. More specifically, the claim is something like this:

(*) If you always maximize expected utility, then over time you're likely to maximize actual utility.

Since "utility" is (by definition) something you'd rather have more of than less, (*) does look like a decent consideration in favour of maximizing expected utility. But is (*) true?

Not in full generality. A well-known counterexample is known as "gambler's ruin". Suppose your utility is measured in pounds sterling. Initially you have £1. Now a fair coin is tossed over and over. On each toss, you have the opportunity to bet your total wealth. If the coin lands heads, you get back three times what you bet. If the coin lands tails, you lose everything. As an expected utility maximizer, you would accept the bet each time. You are then practically certain to end up with £0 over time. Maximizing expected utility does not make it likely that in the long run you'll have a lot of actual utility.

The long-run argument must be a little more complicated. Perhaps (*) holds in a lot of normal cases. Then we could argue that in those cases, one should maximize expected utility. And perhaps we could cover the "non-normal" cases by arguing that the same principle should be used for all cases.

So under what conditions is (*) true?

The only answer I've come across in conversation and in the literature refers to repeated decision problems and the Laws of Large Numbers. (This is one of two arguments for the expected utility norm discussed by Ray Briggs in their Stanford Encyclopedia article on the norm.) The argument is simple.

Suppose you face the very same decision problem again and again, with the same options, same outcomes, same probabilities, and same utilities. Focus on a particular option, and assume it is chosen over and over. The Law of Large Numbers implies that the relative frequency of every possible outcome is likely to converge to the probability of that outcome. Consequently, the expected utility of the option is likely to converge to the average actual utility of the option. Which is just what (*) says.

As Ray points out, the argument is not very convincing, because the conditions for (*) are so unusual. In real life, we practically never face the very same decision problem again and again.

In addition, the Laws of Large Number only tell us what happens in the limit. So the argument does not actually favour expected utility maximization over, say, the alternative strategy of minimizing expected utility in the first 10100 decisions and thereafter maximizing expected utility. In the infinite limit, this strategy converges to the same average utility as maximizing expected utility.

But these problems can be fixed. Let's start with the easier one, the second.

Take any option X in the repeated decision problem, and let O be one of the outcomes it might produce. Let p be the probability of O (given X) in a single trial. The number of times that O comes about in n trials then has a Binomial distribution with mean np and variance np(1-p). As n gets larger, the relative frequency of O among all trials is therefore likely to be close to the probability p – and not just in the infinite limit. For example, with p=0.5 and n=100, the probability that the relative frequency lies between 0.4 and 0.6 is 0.97. So, for any possible outcome of X, the relative frequency of that outcome is likely to quickly approach its probability. And so the average utility of X is likely to quickly approach its expected utility.

Now let's see if we can drop the assumption that the same decision problem is faced again and again. With the help of some probability theory, this turns out to be relatively easy, once the question is expressed in the right way.

Suppose an agent faces n decision problems in a row; the problems need not be identical. Let a strategy be a function that selects one option in each problem. Hold fixed some such strategy S. Let Ui be a random variable that maps each state in the i-th decision problem to the utility of following strategy S in that problem. Let T = ∑i Ui. So T is a random variable for the total (actual) utility gained by following strategy S across all n problems. We want to compare T with the sum of the expected utilities of following S in all n problems. Notice that the expected utility of following S in problem i is simply the mean of Ui. So what we need to show is that

(**) As n gets large, the sum T of n random variables Ui is likely to (quickly) approach the sum of the mean of these variables.

We can't prove (**), because it is not generally true. But it is true in a wide range of cases. In particular, suppose the states in the different decision problems are probabilistically independent. Then elementary probability theory already implies that the mean of T equals the sum of the mean of the Ui, and the variance of T equals the sum of the variances of the Ui, assuming these means and variances exist. If the Ui distributions satisfy certain further assumptions (such as Lindeberg's condition), then a generalised form of the Central Limit Theorem reveals that T will in fact approach a Gaussian distribution with that mean and variance. And the Berry-Esseen Theorem reveals that under certain assumptions, the approximation happens quickly.

So under the assumptions just mentioned, over time the total utility gained by following any given strategy S is indeed likely to (quickly) approach the sum of the expected utility of the option selected by S in the individual problems. In other words, you're likely to maximize total actual utility by maximizing expected utility in each decision problem.

We've still made some fairly strong assumptions. In particular:

(1) I've assumed that which n decision problems are faced does not depend on the choices made in earlier decision problems. This is not the case in the "gambler's ruin" scenario. It seems plausible to me that the assumption could be weakened, but I'm not sure how.

(2) I've assumed that the agent has a probability over the joint space comprising the states of all individual decision problems, and that the states in different problems are probabilistically independent. In real life, one might have thought that our probabilities usually change between decision problems, and that the states aren't always independent. Again, I think these assumptions can plausibly be justified and/or relaxed. For example, the relevant joint probability doesn't have to be the agent's initial probability before the first problem; we could take the probabilities over the states in the second problem to be given by the agent's probabilities over these states after the first problem has been resolved. This needs to be spelled out more carefully though. There are also variants of the Central Limit Theorem that don't assume full independence.

We also needed to assume that the individual decision problems satisfy certain further constraints such as Lindeberg's condition. Concretely, this means that we have to rule out (for example) that the utilities in one decision problem are vastly greater than the utilities in all others; otherwise the actual total utility will be almost entirely determined by the choice in this single problem. That doesn't seem too unrealistic.

Anyway, I feel I'm reinventing the wheel. Surely all this has been noticed before. Strangely, I can't find any discussion of it anywhere. The only version of the long-run argument that I've seen in the literature is the silly one involving infinite repetitions of the very same decision problem.

Comments

# on 02 December 2022, 18:30

There is an interesting technical issue in this argument. Claim (**) as it stands is false. The sum of n random variables cannot be expected to approach the sum of the means, no matter what kind of reasonable boundedness conditions we put on the variables. For instance suppose the independent random variables all have probability 1/2 of being +1 and probability 1/2 of being -1. The mean of each random variable is zero, but the sum of n random variables is always at least one unit away from zero when n is odd (and hence never has zero as its limit) and is usually at least one unit away from zero when n is even (because the sum is a Gaussian with standard deviation of the order of the square root of n, and that's too flat around zero for the sum to be often close to zero).

Intead, (**) should read: "As n gets large, the *mean* T of n random variables Ui is likely to (quickly) approach the *mean* of the mean of these variables."

The problem, however, is that it is the sum, not the mean, that the agent has reason to care about.

In fact, we can construct cases where the Law of Large Numbers applies to a sequence of independent gambles, and the expected value of each gamble is positive, but the agent will almost surely lose an infinite amount in the limit as n goes to infinity. Their *average* loss goes to zero, but their *total* loss is unbounded. https://alexanderpruss.blogspot.com/2022/10/the-law-of-large-numbers-and-infinite.html

# on 03 December 2022, 10:08

Interesting, thanks! I'm not sure how damaging your counterexample is to my line of argument though. You're right that the sum T doesn't technically approach the sum of the mean of the individual variables, but T is still likely to be *close* to the sum of the mean. Since we lead finite lives, perfect convergence in the limit isn't all that relevant (in my view) to whether we should maximize expected utility.

You're also right, of course, that there are pathological cases in which maximizing expected utility is almost certain to lead to disaster, unless the utilities are bounded. But these cases don't satisfy the assumptions under which I thought (**) is correct. As you say in your post, your case doesn't satisfy the assumptions of the CLT.

I share your thought that the CLT, rather than the LLN, may provide a better foundation for long-run arguments in support of expected utility maximization.

# on 05 December 2022, 22:27

Yeah.

Though I think the pathological cases show the following: Either
(a) sometimes you shouldn't maximize utility, or
(b) the following principle is false: (***) If almost surely some behavioral policy eventually leads to loss in some infinite scenario as compared to some other policy, the first policy is irrational in that scenario.

For if you should always maximize utility, then you should maximize utility in the pathological cases, and that contradicts (***) (since the policy of ignoring small probabilities does better in my and McGee's pathological cases).

But now it seems to me that the argument you are offering for maximizing expected utilities presupposes (***). For without (***), how can we infer from the fact that eventually your payoffs are positive in the non-pathological case that you should maximize utility in that case?

I may be missing something.

# on 07 December 2022, 10:32

Fair enough. The main point of this post was to improve upon existing long-term arguments for maximizing expected utility. Ideally, the upshot would be that a good decision rule should agree with the MEU rule in cases where the preconditions of the argument are satisfied.

I have no clear view about the pathological cases. Perhaps the right lesson is that one shouldn't always maximize expected utility.

Add a comment

Please leave these fields blank (spam trap):

No HTML please.
You can edit this comment until 30 minutes after posting.