Binding and pre-emptive binding in Newcomb's Problem
When I recently taught Newcomb's Problem in an undergraduate class, opinions were – of course – divided. Some students were one-boxers, some were two-boxers. But few of the one-boxers were EDTers. I hadn't realised this in earlier years. Many of them agreed, after some back and forth, that their reasoning also supports one-boxing in a variation of Newcomb's Problem in which both boxes are transparent. In this version of the case, EDT says that you should two-box.
The argument students gave in support of one-boxing is that committing to one-boxing would make it likely that a million dollars is put into the opaque box.
This line of thought is most convincing if we assume that you know in advance that you will face Newcomb's Problem, before the prediction is made. It is uncontroversial that if you can commit yourself to one-boxing at this point, then you should do it.
By "committing", I mean what Arntzenius, Elga, and Hawthorne (2004) call "binding". By committing yourself to one-box, you would effectively turn your future self into an automaton that is sure to one-box. Your future self would no longer make a decision, based on their information and goals at the time. They would simply execute your plan.
In the present version of Newcomb's problem, it would be great if you had the capacity of binding. Such a capacity would come in useful in many other situations as well. Arntzenius, Elga, and Hawthorne (2004) show that it would help in several tricky puzzles involving infinity. McClennen (1990), Buchak (2013), and others argue (in effect) that it would help avoid a sure loss for certain kinds of agents in certain sequential decision problems. Game theory is full of examples where rational agents end up in a bad place because they can't make credible threats or assurances. Again, a binding capacity might help.
Arntzenius, Elga, and Hawthorne (2004) conclude that ideally rational agents need a capacity for binding. I'm somewhat sympathetic to this conclusion. As Meacham (2010) points out, it is analogous to the suggestion that ideally rational agents need a capacity for randomizing. Meacham thinks the latter suggestion is clearly false, but I have endorsed it in Schwarz (2015).
Let's accept, at least for the sake of the argument, that ideally rational agents would bind themselves to one-box in a version of Newcomb's Problem where they know about the setup before the prediction. It follows that such agents would one-box when the time to take the boxes has come. But they wouldn't decide to one-box. They wouldn't decide to do anything at this point. They would blindly execute the plan to which they have bound themselves. An agent can't rationally decide to one-box, I would say, but under some conditions a rational agent might end up one-boxing nonetheless.
Now let's look at the standard version of Newcomb's Problem, in which you don't know about the prediction before it is made. Would a rational agent one-box or two-box?
One might suggest that we should all pre-emptively bind ourselves to one-boxing in Newcomb's Problem, even if we don't know that we will ever face this situation. That is, we should turn ourselves into an agent who is disposed to blindly one-box in any Newcomb Problem. Spohn (2011) endorses this suggestion.
The idea can be generalised. Perhaps we should turn ourselves into an agent who always blindly makes the kinds of choices an agent would make who does well in the relevant decision situation. And then we might well say that the agent doesn't need to make the choices blindly. Rather, they should deliberately choose acts on the basis of whether agents who do well are disposed to make these acts. This leads to a family of non-standard decision theories, including the "disposition-based decision theory" of Fisher (n.d.), the "functional decision theory" of Yudkowsky and Soares (2017), and the "cohesive expected utility theory" considered in Meacham (2010).
So. Should we pre-emptively bind ourselves to one-box, assuming we have the capacity to do this? Would you want to be a one-boxer?
One might argue that one-boxers do better, in expectation, than agents who are disposed to two-box. The difference, so the argument, only shows up in Newcomb Problems. And here the one-boxers are indeed highly likely to do better.
But this argument is wrong. Why would the difference only show up in Newcomb Problems? Consider this scenario:
Newcomb's Revenge. In front of you are two boxes, one transparent, the other opaque. You can take either the opaque box or both boxes. The transparent box contains $1000. A demon has tried to detect what you would do in Newcomb's Problem. If she figured out that you would one-box, she has left the opaque box empty. If she figured out that you would two-box (in Newcomb's Problem), she has put a million into the box.
In this situation, agents who are disposed to two-box do better than agents who are disposed to one-box.
I don't think Newcomb's Revenge is any more or less far-fetched than Newcomb's original Problem. Who knows which you are more likely to face? Without more information about the world, we can't say whether you would do better if you were disposed to one-box or to two-box.
Friends of "functional decision theory" claim that agents who follow FDT generally do better than agents who follow CDT or EDT. But FDT agents are one-boxers, and so they do worse in Newcomb's Revenge. Friends of FDT might complain that the scenario is "unfair", but I don't see why it should be considered more unfair than Newcomb's original Problem. In the original problem, the demon favours agents with a one-boxing disposition, by effectively giving a million to anyone who they believe to have this disposition. In Newcomb's Revenge, the demon does exactly the same to agents with a two-boxing disposition. If anything, the cases look completely analogous.
Consider Soares, 2014, claiming that "Newcomblike problems are everywhere". You can analogize the original Newcomb's to the prisoner's dilemma and other problems of cooperation. In that case, then, agents that consistently one-box will be rewarded in everyday life more than agents that two-box. I claim that we actually play newcomblike problems all the time, but run into revenge-style games much less often.
It's entirely correct to characterize anyone performing Newcomb's Revenge as a demon! By contrast, someone who decides to offer you vulnerability or power, based on their assessment of your character... that happens all the time, can be isomorphic to Newcomb's problem, and the clear moral and utility-maxing option is to be actually trustworthy. Take the big reward of trust, leave some money on the table and decline to abuse. This reliably wins.
Alternatively, just look at the Parfait's Hitchkiker problem, which is laid out in the FDT paper. The honorable and moral choice, the choice that survives / wins / gets the pile of utility, the choice you would want to precommit to, and the FDT agent's choice all line up. This is not a coincidence: FDT is built partially to formalize the winning and moral choice of cooperation, in Twin-Prisoner's dilemma, Newcomb's, Parfait's Hitchhiker, and all other problems where other people can model your thinking.
tldr it matters because we actually play newcomb's to cooperate all the time. by contrast, revengeful demons are (thankfully) rare.