One might consider an old fashioned doomsday type argument - nuclear technology and a pessimistic view of human nature with a constant small probability of nuclear war for each generation of politicians.]]>

Let's look at Nick Bostrom's version of the argument, as presented for example in "Bostrom (2008).

We compare two possibilities about the prospects of humanity:

*Early Doom*: The total number of humans who will have ever lived is 100 billion.

*Late Doom*: The total number of humans who will have ever lived is 100 trillion.

The argument goes as follows.

Early Doom and Late Doom have roughly equal prior probability. Every Early Doom world is inhabited by 100 billion people; a priori, each of these positions is equally likely to be ours. Similarly for the 100 trillion positions in Late Doom worlds. If we now take into account the fact that there have only been around 50 billion humans so far (i.e., that our "birth rank" is around 50 billion), it follows by Bayes' theorem that Early Doom is vastly more probable than Late Doom.

More precisely, using 'E' for Early Doom, 'L' for Late Doom, and 'R' for the information that our birth rank is around 50 billion, Bayes' theorem gives us:

\[\begin{align*} P(E / R) &= \frac{P(R / E) P(E)}{P(R / E) P(E) + P(R / L) P(L)}\\ &= \frac{1/10^{11} \cdot 1/2}{1/10^{11} \cdot 1/2 + 1/10^{14} \cdot 1/2} \approx 0.999. \end{align*} \]Can we conclude that it is 99.9% likely that we will soon go extinct?!

The most obvious problem with this argument is that E and L are not the only (a priori) possibilities. What do we get if we drop this assumption?

Let's use 'N' for the total number of humans who will have ever lived. Suppose we start with a uniform prior over N=1 to N=10^{100} (say), generalizing Bostrom's uniform prior over E and L. Within each N=k world, the prior is evenly divided over all humans. Each position in each N=k possibility then has probability 1/(k*10^{100}). This is also the unnormalized posterior probability of N=k after conditioning on our position (birth rank) r, for k>=r. The probability of N=k is therefore inversely proportional to k:

\[
P(N\!=\!k) = \frac{c}{k},
\]

where c is a constant.

This does imply that every small-world hypothesis N=k is much more probable than a corresponding large-world hypothesis N=1000k. On the other hand, there are many more large-world possibilities than small-world possibilities. For example, the probability of N=10^{11} is about equal to the probability that N is between 10^{14} and 10^{14}+1000. So we can be as confident that there will be 100 billion people as that there will be 100 trillion people plus or minus 500. It's not obvious that this should disturb us.

In fact, the calculation implies that we are a lot more likely to be among the first half of all humans than among the second half. On the face of it, this may seem unduly optimistic, given that (by definition) half of all humans in any world are among the second half.

One might respond even if we're among the first half of all humans, we may still be close to extinction, given that the human population is so much larger today than it was in the distant past.

This points at the other obvious flaw in the argument: We have a lot of further information besides our birth rank.

Suppose all you know about a population of bacteria is that it has doubled every hour for the last few days and currently stands at 100 million. What's your probability distribution over how many bacteria will ever have existed in that population?

Hard to say, but the distribution should not be flat. We expect tendencies to project into the future. It's more likely that the population will double again in the next hour than that it will quadruple or halve.

Similarly, the fact that humanity has been growing favours futures with a lot more humans than futures with fewer humans.

But don't we know that the exponential growth of the human population will come to an end soon? Well, yes. We have *a lot* of further information. It's really hard to assess how it all adds up.

Once we see the obvious flaws in the argument, it's not clear why we might want to change the crucial assumption about priors that Bostrom and others have focussed on: that Early Doom and Late Doom have roughly equal prior probability.

In the future, I might use the following variation of the doomsday argument (inspired by some of the cases in "Bostrom (2001)):

Doom II.We have created a device that will either destroy all humans or ensure our interplanetary survival for millions of years. Which of these will happen depends on whether the Nth digit of a certain physical constant is even (doom) or odd (no doom). We have not been able to measure this digit. How confident should we be that it is even?

Here we can, for simplicity, assume that there are really just two possibilities, much like Early Doom and Late Doom. If we start with a uniform prior over whether the digit is even or odd – as seems reasonable – and take into account our early birth rank, as above, we get the seemingly unreasonable conclusion that the digit is almost certainly even.

Bostrom, Nick. 2001. “The Doomsday Argument Adam & Eve, UN++, and Quantum Joe.” *Synthese* 127 (3): 359–87. "doi.org/10.1023/A:1010350925053.

Bostrom, Nick. 2008. “The Doomsday Argument.” *Think* 6 (17-18): 23–28. "doi.org/10.1017/S1477175600002943.

I'll write 'A>C' for the conditional 'if A then C'. For the purposes of this post, we assume that 'A>C' is true at a world w iff all the closest A worlds to w are C worlds, by some contextually fixed measure of closeness.

It has often been observed that the simplification effect resembles the "Free Choice" effect, i.e., the apparent entailment of '◇A' and '◇B' by '◇(A∨B)', where the diamond is a possibility modal (permission, in the standard example). But there are also important differences.

According to standard modal semantics, '◇(A∨B)' is equivalent to '◇A ∨ ◇B'. But '(A∨B)>C' is not equivalent to '(A>C) ∨ (B>C)'. For example, suppose C is true at the closest A worlds but not at the closest B worlds, and the closest A∨B worlds are B worlds. Then 'A>C' is true, but '(A∨B)>C' is false.

In general, the truth-value of '(A∨B)>C' depends on three factors:

- whether the closest A worlds are C worlds,
- whether the closest B worlds are C worlds, and
- the relative closeness of A and B (i.e., whether the closest A worlds are closer than the closest B worlds or vice versa).

Nothing like the third factor is relevant for '◇(A∨B)'.

I'm not going to go over Franke's model of Free Choice again. What's important is that it involves the following three states:

- t
_{A}, where A is permitted but B is not, - t
_{B}, where B is permitted but A is not, and - t
_{AB}, where both A and B are permitted.

We have the following association between these states and the truth-value of relevant messages:

'◇A' | '◇B' | '◇(A∨B)' | |
---|---|---|---|

t_{A} |
1 | 0 | 1 |

t_{B} |
0 | 1 | 1 |

t_{AB} |
1 | 1 | 1 |

For conditionals, he says, the same kind of association holds, "provided we reinterpret the state names":

'(A>B)' | '(B>A)' | '(A∨B)>C' | |
---|---|---|---|

t_{A} |
1 | 0 | 1 |

t_{B} |
0 | 1 | 1 |

t_{AB} |
1 | 1 | 1 |

This is table 86 on p.44. But how are we supposed to interpret these state names?

There is no interpretation that would make the table correct. The table makes it look as if the truth-value of '(A∨B)>C' is determined by the truth-values of 'A>C' and 'B>C'. But it is not. For example, what about a state in which 'A>C' is true, 'B>C' is false, and '(A∨B)>C' is false, because B is closer than A? This possibility is nowhere to be found in the table.

So Franke's IBR model of Free Choice does not, in fact, carry over to SDA.

(I would assume that this problem has been noticed before, but it isn't mentioned in "Bar-Lev and Fox (2020) or "Fox and Katzir (2021), where Franke's model is discussed. Am I missing something?)

Anyway, let's move on.

As I said above, the truth-value of '(A∨B)>C' depends on

- whether the closest A worlds are C worlds,
- whether the closest B worlds are C worlds, and
- the relative closeness of A and B (i.e., whether the closest A worlds are closer than the closest B worlds or vice versa).

There are 12 possible combinations of these three factors. '(A∨B)>C' is true in five of them:

(S1)A is closer, A>C, ¬(B>C)

(S2)A is closer, A>C, B>C

(S3)B is closer, ¬(A>C), B>C

(S4)B is closer, A>C, B>C

(S5)A and B are equally close, A>C, B>C

Here, 'A is closer' means that the closest A worlds are closer than the closest B worlds, and 'A>C' means that the closest A worlds are C worlds.

Note that three of these five cases have both A>C and B>C. Imagine a speaker who thinks that their addressee has uniform priors over all twelve cases. Imagine the speaker knows A>C and B>C. Then '(A∨B)>C' is a already a better choice than, say, 'A>C' or 'B>C'. '(A>C) ∧ (B>C)' is better still, but if a higher-up hearer only compares the uttered message to its alternatives, we might expect to get an SDA effect, without any higher-order implicature.

This isn't quite right, though.

With uniform hearer priors, '(A∨B)>C' is a good option to convey A>C ∧ B>C, but it is also a good option to convey other states. In particular, it is the best option (at level 1) among its alternatives for conveying S1 and S3. That's because 'A>C' and 'B>C' (and their negations) are each true in six states and thus confer lower probability to S1 and S3 than '(A∨B)>C' would.

(Incidentally, this is why Franke's model doesn't work for SDA: '(A∨B)>C' is not a "surprise message" at level 2.)

Here's a simulation that confirms these claims:

var states = Cross({ closest: ['A', 'B', 'A,B'], Cness: ['A','B','A,B','-'] }) // C-ness 'A' means that the closest A worlds are C worlds var meanings = { 'A>C': function(s) { return s['Cness'].includes('A') }, 'B>C': function(s) { return s['Cness'].includes('B') }, 'AvB>C': function(s) { return s['closest'] == 'A' && s['Cness'].includes('A') || s['closest'] == 'B' && s['Cness'].includes('B') || s['closest'] == 'A,B' && s['Cness'] == 'A,B' }, 'A>C and B>C': function(s) { return s['Cness'].includes('A') && s['Cness'].includes('B'); }, '-': function(s) { return true } }; var alternatives = { 'A>C': ['A>C', 'B>C', '-'], 'B>C': ['A>C', 'B>C', '-'], 'AvB>C': ['A>C', 'B>C', 'AvB>C', '-'], 'A>C and B>C': keys(meanings), '-': ['-'] } var state_prior = Indifferent(states); var hearer0 = Agent({ credence: state_prior, kinematics: function(utterance) { return function(state) { return evaluate(meanings[utterance], state); } } }); var speaker1 = function(observation, options) { return Agent({ options: options || keys(meanings), credence: update(state_prior, observation), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; display('hearer0 -- A>C is compatible with six states, AvB>C with five:'); showKinematics(hearer0, ['A>C', 'AvB>C']); var s1 = { closest: 'A', Cness: 'A' }; var s2 = { closest: 'A', Cness: 'A,B' }; display('speaker1 -- prefers AvB>C if she knows the state is S1 or S2'); showChoices(speaker1, [s1, s2], [alternatives['AvB>C']]);

To derive the Simplification effect, we need to ensure that speakers don't use '(A∨B)>C' to convey S1 or S3.

There are different ways to achieve this. I'm going to invoke a QUD.

Recall, once more, that the truth-value of '(A∨B)>C' is determined by the truth-value of 'A>C' and 'B>C' and the relative closeness of A and B. Normally, however, we don't expect that speakers who utter '(A∨B)>C' are trying to convey anything about the relative closeness of A and B. What's normally under discussion is whether A>C and whether B>C, not which of A and B is closer.

So let's add a QUD to the model, as in "this post. Normally, the QUD is whether A>C and whether B>C. '(A∨B)>C' is then no longer a good option for a speaker who knows that the state is S1 or S3: 'A>C' is better in S1, 'B>C' is better in S3.

With this QUD, '(A∨B)>C' is the best option among its alternatives only in three of the 12 possible states: in S2, S4, and S5. In each of these, we have A>C and B>C. If the level-2 hearer assumes that the speaker chose the best option from among the alternatives of the chosen utterance, he will infer from an utterance of '(AvB)>C' that 'A>C' and 'B>C' are both true:

// continues #1 var quds = { 'state?': function(state) { return state }, 'A>C?B>C?': function(state) { return state['Cness'] } }; var makeHearer = function(speaker, state_prior, qud) { return Agent({ credence: state_prior, kinematics: function(utterance) { return speaker ? function(s) { var speaker = speaker(s, alternatives[utterance], qud); return sample(choice(speaker)) == utterance; } : function(s) { return evaluate(meanings[utterance], s); } } }); }; var makeSpeaker = function(hearer, state_prior, qud, cost) { return function(observation, options) { return Agent({ options: options || keys(meanings), credence: update(state_prior, observation), utility: function(u,s){ var qu = quds[qud]; return marginalize(learn(hearer, u), qu).score(qu(s)) - cost(u); } }); }; }; var cost = function(utterance) { return utterance == '-' ? 2 : utterance.length/20; }; var qud = 'A>C?B>C?'; var hearer0 = makeHearer(null, state_prior, qud); var speaker1 = makeSpeaker(hearer0, state_prior, qud, cost); var hearer2 = makeHearer(speaker1, state_prior, qud); showKinematics(hearer2, ['AvB>C']);

('Cness: "A,B"' means that the closest A worlds and the closest B worlds are both C worlds.)

I've defined this simulation with factory functions so that one can easily create more agents and check different parameters. For example, if you change `qud`

to `'state?'`

, the level-2 hearer doesn't become convinced of A>C and B>C.

I've assumed that the SDA effect arises because the relative closeness of A and B is normally not under discussion when we evaluate '(A∨B)>C'.

This might shed light on a puzzle about the distribution of SDA.

(1)If Spain had fought with the Axis or the Allies, it would have fought with the Axis.

A speaker who utters (1) would not be interpreted as believing that Spain would have fought with the Axis if it had fought with the Allies (even though it is theoretically possible to fight on both sides).

Similarly for (2), from "Lassiter (2018):

(2)If Spain had fought with the Axis or the Allies, it would probably have fought with the Axis.

Why don't we get SDA here?

We might, of course, say that the inference is cancelled due to the implausibility of the conclusion. But perhaps we can say more.

Clearly, when somebody utters (1) or (2), the relative closeness of the two possibilities is under discussion. The point of (1) is precisely to state that Spain joining the Allies is a more remote possibility than Spain joining the Axis.

In the context of (1) and (2), then, the QUD is not `'A>C?B>C?'`

. Perhaps it is `'state?'`

, or perhaps it is which of A and B is closer. The above model predicts that this breaks the derivation of SDA.

The hypothesis that SDA depends on the QUD is supported by the following observation, due to "Nute (1980).

Consider (3):

(3)If Spain had fought with the Axis or the Allies, Hitler would have been happy.

In a normal context, (3) conveys that Hitler would have been happy no matter which side Spain had fought on, which is false. So here the SDA effect is in place. Now Nute observes that (3) can become acceptable if it is uttered right after (1).

A similar point could be made with (4):

(4)If Spain had fought with the Axis or the Allies, Hitler would probably have been happy, for surely Spain would have chosen the Axis.

The 'for surely' explanation in (4) clarifies that relative remoteness is under discussion, so that SDA isn't licensed. Likewise, if (3) is uttered right after (1), the relative remoteness question that is raised by (1) is still in place,

(Can we explain why (1), (2), and (4) make the relative remoteness of A and B salient? Presumably the explanation is that these sentences would be infelicitous if the relative remoteness were irrelevant, so it becomes relevant by accommodation. Might be useful to write a simulation for this.)

I don't like the above model.

I'm not sure why. I think it's because the inference is driven by quantitative likelihood comparisons – for example, that A>C ∧ A>C holds in 5/12 cases where '(AvB)>C' is true, as opposed to 6/12 where 'A>C' is true. Is our language faculty really sensitive to these quantitative differences?

The likelihood dependence also means that the inference only works for certain kinds of state priors.

I've assumed that the state prior is uniform. But the most striking examples of Simplification are cases like (3), where one disjunct is clearly more remote than the other.

(3)If Spain had fought with the Axis or the Allies, Hitler would have been happy.

The above model runs into trouble here.

If it is common knowledge that A is closer than B, then '(AvB)>C' is semantically equivalent to 'A>C'. A hearer should be puzzled why the speaker would use the needlessly complex '(AvB)>C'.

The problem doesn't just arise if it is certain that A is closer than B. Here is a prior according to which it is *almost certain* that A is closer than B:

// continues #2 var state_prior = update(Indifferent(states), { closest: 'A' }, { new_p: 0.99 }); viz.table(state_prior);

(The call to `update`

Jeffrey-conditionalizes the uniform prior on the information that A is closer than B, with a posterior probability of 0.99.)

With this prior, a fully informed speaker would never utter '(AvB)>C' if the QUD is 'A>C?B>C?':

// continues #3 var hearer0 = makeHearer(null, state_prior, 'A>C?B>C?'); var speaker1 = makeSpeaker(hearer0, state_prior, 'A>C?B>C?', cost); var hearer2 = makeHearer(speaker1, state_prior, 'A>C?B>C?'); showKinematics(hearer2, ['AvB>C']);

This isn't a decisive objection. One might argue that the computation of SDA is insulated from the worldly knowledge that A is closer than B. One could also argue that a hearer might be unsure about whether the speaker intrinsically prefers uttering 'A>C' over the slightly more complex '(AvB)>C'. We can still predict SDA if there's no preference for simpler utterances:

// continues #4 var no_cost = function(utterance) { return 0 }; var speaker1 = makeSpeaker(hearer0, state_prior, 'A>C?B>C?', no_cost); var hearer2 = makeHearer(speaker1, state_prior, 'A>C?B>C?'); showKinematics(hearer2, ['AvB>C']);

But let's try a different approach.

In section 1, I emphasized some differences between SDA and Free Choice.

In particular, '(AvB)>C' is not (literally) equivalent to 'A>C ∨ B>C', whereas '◇(A∨B)' is equivalent to '◇A ∨ ◇B'.

Still, '(AvB)>C' *entails* 'A>C ∨ B>C'. A literal-minded speaker would therefore only utter '(AvB)>C' if she knows that at least one of A>C and B>C obtains.

Let's assume that the speaker has a preference for simpler utterances, that A worlds are likely to be closer than B worlds, as in the prior from source block #3, and that the relative closeness of A and B is not under discussion. As we saw in simulation #4, a literal-minded speaker who is fully informed about the state would then always prefer 'A>C' or 'B>C' over '(AvB)>C'.

What if the speaker isn't fully informed? Suppose all she knows is that at least one of A>C and B>C obtains. In that case, 'A>C' and 'B>C' would be bad. '(AvB)>C' would be better. The speaker doesn't know that it is true, but *with respect to the QUD* it wouldn't communicate anything false.

Or suppose the speaker knows that either A is closer than B and A>C holds, or B is closer than A and B>C holds. In this case, she knows that '(AvB)>C' is true, without knowing that 'A>C' is true or that 'B>C' is true.

In sum, a literal-minded speaker would prefer '(AvB)>C' among its alternatives iff (i) she knows that at least one of A>C and B>C obtains, and (ii) she lacks a certain kind of further information.

Imagine a hearer who believes himself to be addressed by such a speaker. Hearing '(AvB)>C', he could infer (i) and (ii).

This is analogous to what the hearer would infer from an utterance of '◇(A∨B)' in the case of Free Choice. It doesn't involve any frequency comparisons.

What would the hearer infer from 'A>C'? Intuitively, he should be able to infer that 'B>C' is false. Recall that the QUD is whether A>C and whether B>C. There seems to be a general mechanism by which, if the QUD is whether X and whether Y, and a speaker says X, one can infer that ¬Y.

So let's assume that 'A>C' would prompt an inference to ¬(B>C). This is analogous to the inference from '◇A' to ¬◇B.

Now imagine a higher-level speaker who thinks that he is addressing such a hearer. Imagine she knows that A>C and B>C both obtain. Uttering 'A>C' would be bad, as it would convey ¬(B>C). Uttering 'B>C' would be equally bad. '(AvB)>C' would be better. It would be the best option among its alternatives.

As a result, a hearer on the next level who presumes that the speaker is well-informed would regard '(AvB)>C' as indicating A>C and B>C.

On this model, the derivation of SDA really is a lot like "the derivation of Free Choice in the previous post.

Let's write a simulation to check that it works.

We need to allow for imperfectly informed speakers. But we have 12 possible states now. This means that there are 2^{12}-1 = 4095 ways to be informed or uninformed. If we consider all possibilities, the simulation becomes painfully slow.

To speed things up, I'll only consider six kinds of speaker information:

- The speaker is fully informed.
- The speaker is fully informed about which of A and B is closer, but lacks any information about A>C and B>C.
- The speaker is fully informed about A>C and B>C, but lacks information about which of A and B is closer.
- The speaker knows whether at least one of 'A>C' and 'B>C' is true.
- The speaker knows whether '(A∨B)>C' is true.
- The speaker knows nothing.

// continues #5 var access = { // maps states to observations 'full': function(s) { return s }, 'closest': function(s) { return { closest: s.closest } }, 'Cness': function(s) { return { Cness: s.Cness } }, 'A>CvB>C': function(s) { return s.Cness == '-' ? { Cness: '-' } : function(t) { return t.Cness != '-' } }, 'AvB>C': function(s) { var tv = evaluate(meanings['AvB>C'], s); return function(t) { return evaluate(meanings['AvB>C'], t) == tv }; }, 'none': function(s) { return states } } var access_prior = { 'full': 0.4, 'closest': 0.03, 'Cness': 0.03, 'A>CvB>C': 0.02, 'AvB>C': 0.02, 'none': 0.5 };

As in earlier posts, I assume a default presumption that the speaker is fully informed.

// continues #6 var makeHearer = function(speaker, state_prior, qud) { return Agent({ credence: join({ 'state': state_prior, 'access': access_prior }), kinematics: function(utterance) { return speaker ? function(s) { var obs = evaluate(access[s.access], s.state); var speaker = speaker(obs, alternatives[utterance], qud); return sample(choice(speaker)) == utterance; } : function(s) { return evaluate(meanings[utterance], s.state); } } }); }; var makeSpeaker = function(hearer, state_prior, qud, cost) { return function(observation, options) { return Agent({ options: options || keys(meanings), credence: update(state_prior, observation), utility: function(u,s){ var qu = quds[qud]; var hearer_state_credence = marginalize(learn(hearer, u), 'state'); return marginalize(hearer_state_credence, qu).score(qu(s)) - cost(u); } }); }; }; var state_prior = update(Indifferent(states), { closest: 'A' }, { new_p: 0.99 }); // var state_prior = Indifferent(states); var qud = 'A>C?B>C?'; // var qud = 'state?'; var hearer0 = makeHearer(null, state_prior, qud); var speaker1 = makeSpeaker(hearer0, state_prior, qud, cost); var hearer2 = makeHearer(speaker1, state_prior, qud); var speaker3 = makeSpeaker(hearer2, state_prior, qud, cost); var hearer4 = makeHearer(speaker3, state_prior, qud);

(You can see how the effect depends on the state prior and the QUD by uncommenting `var state_prior = Indifferent(states);`

or `var qud = 'state?';`

. With a uniform state prior, we would get SDA by the same mechanism as in section 2.)

// continues #7 display('hearer2:'); showKinematics(hearer2, ['A>C', 'AvB>C']);

As predicted, upon hearing 'AvB>C', the level-2 hearer infers that (i) at least one of A>C and B>C obtains, and that (ii) the speaker lacks information.

Upon hearing 'A>C', the level-2 hearer only has a slight tendency to think that 'B>C' is false.

To get the desired effect, 'AvB>C' should be a better choice for communicating A>C ∧ B>C than 'A>C' and 'B>C', at the next level up. In the case of Free Choice, a slight tendency to infer that '◇B' is false based on an utterance of '◇A' was not enough, because ◇A ∧ ◇B was even more unlikely conditional on '◇(A∨B)'. In the present case, 'AvB>C' turns out to yield a comparatively high credence of around 40% in A>C ∧ B>C. This is enough to derive SDA:

// continues #8 display('hearer4:'); showKinematics(hearer4, ['AvB>C']);

Like the first model, this model relies on subtle likelihood comparisons, and therefore on specific assumptions about the priors. For example, the derivation doesn't work in a painfully slow model that treats all ways of being uninformed as equally likely:

// continues #8 var access_prior = { 'full': 0.45, 'partial': 0.05, 'none': 0.5 }; var get_observation = { 'full': function(state) { return state }, 'partial': function(state) { // return uniform distribution over all partial observations compatible with state var observations = filter(function(obs) { obs.includes(state) && obs.length > 1 && obs.length < states.length }, powerset(states)); return uniformDraw(observations); }, 'none': function(state) { return states } }; var makeHearer = function(speaker, state_prior, qud) { return Agent({ credence: join({ 'state': state_prior, 'access': access_prior }), kinematics: function(utterance) { return function(s) { var obs = evaluate(get_observation[s.access], s.state); var speaker = speaker(obs, alternatives[utterance], qud); return sample(choice(speaker)) == utterance; } } }); }; var hearer2 = makeHearer(speaker1, state_prior, qud); showKinematics(hearer2, ['A>C', 'AvB>C']);

Here, A>C ∧ B>C has a slightly greater credence under 'A>C' than under 'AvB>C', so the level-3 speaker would prefer 'A>C' to communicate A>C ∧ B>C, and we won't get an SDA effect.

A better model would make sure that 'A>C' strongly conveys ¬(B>C). The non-arbitrariness requirement from my previous post crudely serves this purpose:

// continues #8 var makeHearer = function(speaker, state_prior, qud) { return Agent({ credence: join({ 'state': state_prior, 'access': access_prior }), kinematics: function(utterance) { return function(s) { var obs = evaluate(access[s.access], s.state); var speaker = speaker(obs, alternatives[utterance], qud); return bestOption(speaker) == utterance; } } }); }; var hearer2 = makeHearer(speaker1, state_prior, qud); display('hearer2:'); showKinematics(hearer2, ['A>C', 'AvB>C']); var speaker3 = makeSpeaker(hearer2, state_prior, qud, cost); var hearer4 = makeHearer(speaker3, state_prior, qud); display('hearer4:'); showKinematics(hearer4, ['AvB>C']);

Bar-Lev, Moshe E., and Danny Fox. 2020. “Free Choice, Simplification, and Innocent Inclusion.” *Natural Language Semantics* 28 (3): 175–223. "doi.org/10.1007/s11050-020-09162-y.

Fox, Danny, and Roni Katzir. 2021. “Notes on Iterated Rationality Models of Scalar Implicatures.” *Journal of Semantics* 38 (4): 571–600. "doi.org/10.1093/jos/ffab015.

Franke, Michael. 2011. “Quantity Implicatures, Exhaustive Interpretation, and Rational Conversation.” *Semantics and Pragmatics* 4: 1:1–82. "doi.org/10.3765/sp.4.1.

Lassiter, Daniel. 2018. “Complex Sentential Operators Refute Unrestricted Simplification of Disjunctive Antecedents.” *Semantics and Pragmatics* 11: 9:EA–. "doi.org/10.3765/sp.11.9.

McKay, Thomas, and Peter Van Inwagen. 1977. “Counterfactuals with Disjunctive Antecedents.” *Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition* 31 (5): 353–56. "doi.org/10.1007/BF01873862.

Nute, Donald. 1980. “Conversational Scorekeeping and Conditionals.” *Journal of Philosophical Logic* 9 (2): 153–66. "doi.org/10.1007/BF00247746.

Like many other contemporary accounts of free choice, mine is inspired by "Kratzer and Shimoyama (2002), who pointed out that the inference might be a higher-order implicature. A hearer might reason as follows.

The speaker said '◇(A or B)'. Saying '◇A' would have implicated that ¬◇B. So this would have been a good choice if the speaker knew that ◇A and ¬◇B. Similarly, '◇B' would have been a good choice if the speaker knew that ◇B and ¬◇A. Since the speaker didn't choose '◇A' or '◇B', she doesn't know that ◇A and ¬◇B, and she doesn't know that ◇B and ¬◇A. Given that she is well informed, it follows that either ◇A or ◇B are both true, or they are both false. The latter is incompatible with what the speaker said. So ◇A and ◇B are both true.

It sounds straightforward. But if you try to implement it, some problems emerge.

First, we need to ensure that '◇A' implicates ¬◇B and '◇B' ¬◇A. Second, we then need to explain why '◇(A or B)' is a reasonable choice to convey that ◇A and ◇B.

To see the second problem, note that the above reasoning seems to go through just as well for plain disjunctions 'A or B', where it would show (falsely) that 'A or B' implicates 'A and B':

The speaker said 'A or B'. 'A' would have implicated ¬B, 'B' would have implicated ¬A. So the speaker doesn't know A and ¬B, and she doesn't know B and ¬A. Given that she is well-informed, either A and B are both true or they are both false. The latter is incompatible with what the speaker said. So A and B are both true.

Any account of free choice must explain why '◇(A or B)' is interpreted as '◇A and ◇B' but 'A or B' is not interpreted as 'A and B'. What makes the difference?

Danny Fox has argued that free-choice effects arise iff the relevant disjunctive statement lacks a conjunctive alternative. (See "Fox (2007), "Singh et al. (2016), "Fox and Katzir (2021).) According to popular ways of defining alternatives (e.g., "Fox and Katzir (2011)), a plain disjunction 'A or B' has 'A and B' as an alternative, but a modalized disjunction '◇(A or B)' does not have '◇A and ◇B' as an alternative. This could explain the difference.

Let's assume that alternatives enter the picture in the way I described in "the final section of this post: When a hearer encounters an utterance U, he wonders why the speaker chose U *rather than some alternative to U*. Concretely, a hearer who receives the message 'A or B' wonders why the speaker didn't utter 'A and B', but a hearer who receives '◇(A or B)' only wonders why the speaker didn't utter '◇A' or '◇B' or '◇(A and B)'; she doesn't even consider '◇A and ◇B'.

Unfortunately, the fixed-alternatives approach makes the first problem worse. Why would '◇A' implicate ¬◇B? In "this post, I showed that 'We have apple juice' can convey 'We don't have orange juice', due to its competition with 'We have apple *and orange juice*'. We could similarly predict that '◇A' can convey ¬◇B due to its competition with '◇A and ◇B'. But '◇A and ◇B' is not an alternative to '◇A'! If hearers only consider genuine alternatives to the chosen utterance, we need a different story of why '◇A' implicates ¬◇B.

We also still face a version of the second problem. If all goes well, we can show that a sufficiently high-level speaker would use '◇(A or B)' to convey ◇A and ◇B. But what about a lower-level speaker? According to standard modal semantics, which I here take for granted, '◇(A or B)' is equivalent to '◇A or ◇B'. At levels before '◇A' is pragmatically strengthened to mean '◇A and ¬◇B', a well-informed speaker would always prefer '◇A' or '◇B' to '◇(A or B)', just as a well-informed speaker would always prefer 'A' or 'B' to 'A or B'. A low-level hearer who knows that the speaker is well-informed can therefore be sure that '◇(A or B)' won't be uttered. But then a higher-level speaker can't figure out how the hearer would respond to '◇(A or B)', so it's unclear how '◇(A or B)' would ever become a sensible choice.

At this point, Franke stipulates that "surprise utterances" like '◇(A or B)' convey nothing at all: the hearer responds by retaining his prior credence. (See the "previous post.) A more natural response in the RSA framework would assume that speakers sometimes fail to choose the optimal act. If we set a low-level speaker's soft-max parameter (alpha) to a finite value, they sometimes say '◇(A or B)', even though '◇A' or '◇B' would have greater expected utility. It turns out, however, that you then can't predict the free-choice effect.

The supposed level-0 strengthening from '◇A' to '◇A and ¬◇B' evidently can't arise as a pragmatic inference. Instead, Champollion et al assume that '◇A' has two literal meanings: its standard unstrengthened meaning, and an "exhaustified" meaning on which it is equivalent to '◇A and ¬◇B'.

In the model below, I will assume that '◇A' has only its standard, unstrengthened meaning. So I need a new answer to the two problems: (1) how does '◇A' convey ¬◇B? (2) how does a low-level hearer make sense of '◇(A or B)'?

Start with the first problem. I need to explain why '◇A' conveys ¬◇B, even though the hearer only considers the alternative '◇B'. My tentative answer is that when a hearer encounters an utterance U, he infers not only that U is among the best of its alternatives, but that U is *uniquely and robustly best*.

To motivate this, suppose tea and coffee are both allowed, and equally relevant. (Perhaps you just asked whether you can have tea or coffee.) In this context, there is something wrong with uttering 'you may have tea'. Why single out the tea? You could just as well have said 'you may have coffee'. The arbitrariness is objectionable. If I say 'you may have tea', you would assume that there's a positive reason to choose this utterance from among its alternatives.

The second problem is to explain how a low-level hearer would interpret '◇(A or B)', which is semantically equivalent to '◇A or ◇B'. I'm going to adopt the obvious response of allowing for imperfectly informed speakers. Before the pragmatic strengthening of '◇A' and '◇B', a *fully informed* speaker would never utter '◇(A or B)'. But a partially informed speaker might. At lower levels, '◇(A or B)' will convey uncertainty about whether only ◇A or only ◇B or both. That's because '◇(A or B)' competes with '◇A' and '◇B', whose literal meanings are stronger. Observing an utterance of '◇(A or B)', a hearer can infer that the speaker is not in a position to utter '◇A' or '◇B': she knows neither ◇A nor ◇B. But she does know ◇A or ◇B, by the literal meaning of what she said. So her information is compatible with (i) only ◇A and (ii) only ◇B and (iii) both ◇A and ◇B.

Let's begin by defining the relevant states, the available utterances, and their meanings. We also define the alternatives for each utterance.

For simplicity, I only distinguish four states, depending on which of A and B are allowed. (I'm not considering the further question whether their conjunction is allowed.)

var states = Cross('MA', 'MB'); var meanings = { 'may A': function(state) { return state['MA'] }, 'may B': function(state) { return state['MB'] }, 'may A or B': function(state) { return state['MB'] || state['MA'] }, 'may A and may B': function(state) { return state['MB'] && state['MA'] }, '-': function(state) { return true } } var alternatives = function(u) { // s is an alternative to u iff s doesn't have more words return filter(function(s) { return numWords(s) <= numWords(u) }, keys(meanings)); }

Speakers work as usual. At any level, the speaker compares all available options by their length and by the expected hearer accuracy they would bring about.

// continues #1 var state_prior = Indifferent(states); var makeSpeaker = function(hearer) { return function(observation, alternatives) { return Agent({ options: alternatives || keys(meanings), credence: update(state_prior, observation), utility: function(u,s){ return marginalize(learn(hearer, u), 'state').score(s) - cost(u); } }); } }; var cost = function(u) { return u == '-' ? 10 : u.length/20; };

Hearers beyond level 1 conditionalize on the assumption that the observed utterance was uniquely and clearly the best option. I've implemented this by defining a `bestOption`

function in the "webppl-rsa package. If `a`

is an agent, then `bestOption(a)`

checks if there is a uniquely and clearly best option for `a`

, and returns it. (Internally, the agent is construed as soft-maxing with low alpha, and an option is "uniquely and clearly best" if it is emerges as substantially more likely than the next best option.)

Hearers are unsure both about the state (what is permitted) and about the speaker's information. Initially, they think that the hearer is most likely to be entirely informed or entirely uninformed.

// continues #2 var access_prior = { 'full': 0.45, 'partial': 0.05, 'none': 0.5 }; var makeHearer = function(speaker) { return Agent({ credence: join({ 'state': state_prior, 'access': access_prior }), kinematics: function(utterance) { return speaker ? function(s) { var obs = evaluate(get_observation[s.access], s.state) var sp = speaker(obs, alternatives(utterance)) return bestOption(sp) == utterance; } : function(s) { return evaluate(meanings[utterance], s.state); }; } }); }; var get_observation = { 'full': function(state) { return state }, 'partial': function(state) { // return uniform distribution over all partial observations compatible with state var observations = filter(function(obs) { obs.includes(state) && obs.length > 1 && obs.length < states.length }, powerset(states)); return uniformDraw(observations); }, 'none': function(state) { return states } };

That's all.

Let's initialize the level-0 hearer and the level-1 speaker, and see how they behave.

// continues #3 var hearer0 = makeHearer(); var speaker1 = makeSpeaker(hearer0); var info_ma_and_mb = [{ MA: true, MB: true }]; var info_ma_or_mb = [{ MA: true, MB: false}, {MA:false, MB:true}, {MA:true, MB:true}]; var info_ma = [{ MA: true, MB: false}, {MA:true, MB:true}]; showBestOption(speaker1, [info_ma_and_mb, info_ma_or_mb, info_ma]);

Here we consider three information states. If the level-1 speaker knows that A and B are both allowed, then 'may A and may B' is the uniquely best option. If she only knows that at least one of A and B is allowed, 'may A or B' is optimal. If she knows that A is allowed and lacks information about B, 'may A' is optimal.

Here is the level-2 hearer:

// continues #4 var hearer2 = makeHearer(speaker1); showKinematics(hearer2, ['may A or B', 'may A and may B', 'may A']);

The hearer interprets 'may A or B' as signalling incomplete information. He interprets 'may A' as (weakly) indicating that B is not allowed, because his prior disfavours partially informed speakers.

On to level 3:

// continues #5 var speaker3 = makeSpeaker(hearer2); showBestOption(speaker3, [info_ma_and_mb, info_ma_or_mb, info_ma]);

The level-3 speaker still chooses 'may A and may B' if she knows that A and B are both allowed. She still chooses 'may A or B' if she only knows that at least one of A and B is allowed. If she knows that A is allowed and lacks information about B, she now prefers 'may A or B', because 'may A' would implicate that B is not allowed, and she doesn't know if this is true.

At level 4, we get the free choice effect:

// continues #6 var hearer4 = makeHearer(speaker3); display("hearer4 hears 'may A or B'"); viz.table(learn(hearer4, 'may A or B'));

Upon hearing 'may A or B', the level-4 hearer considers in what information state 'may A or B' would have been the robustly best option among its alternatives, for the level-3 speaker. The alternatives are 'may A', 'may B', and 'may A or B'. Have a look at the decision matrix for `speaker3`

in a situation where he knows that A and B are both allowed:

// continues #6 showDecisionMatrix(speaker3(info_ma_and_mb));

'May A or B' is clearly best *among its alternatives*. So the level-4 hearer can see two possible explanations for why the speaker chose 'may A or B': either the speaker is incompletely informed (as before), or the speaker knows that A and B are both allowed. (This second possibility did not exist at level 2. A level-1 speaker who knows that A and B are both allowed would prefer 'may A' and 'may B' over 'may A or B', and none of the alternatives to 'may A or B' would be robustly optimal.)

We should also check that a level-5 speaker would utter 'may A or B', if she knows that A and B are both allowed. At this stage, 'may A and may B' still leads to greater hearer accuracy (as it has no "speaker is ignorant" interpretation). The speaker prefers 'may A or B' due to its comparative simplicity:

// continues #7 var speaker5 = makeSpeaker(hearer4); showChoices(speaker5, [info_ma_and_mb]);

Franke's model does not generalize to cases with three disjuncts. The above model does:

var states = Cross('MA', 'MB', 'MC'); var meanings = { 'may A': function(state) { return state['MA'] }, 'may B': function(state) { return state['MB'] }, 'may C': function(state) { return state['MC'] }, 'may A or B': function(state) { return state['MB'] || state['MA'] }, 'may A or C': function(state) { return state['MA'] || state['MC'] }, 'may B or C': function(state) { return state['MB'] || state['MC'] }, 'may A and may B': function(state) { return state['MB'] && state['MA'] }, 'may A and may C': function(state) { return state['MA'] && state['MC'] }, 'may B and may C': function(state) { return state['MB'] && state['MC'] }, 'may A or B or C': function(state) { return state['MB'] || state['MA'] || state['MC'] }, 'may none': function(state) { return !state['MB'] && !state['MA'] && !state['MC']}, 'may A and may B and may C': function(state) { return state['MB'] && state['MA'] && state['MC'] }, '-': function(state) { return true } } var alternatives = function(u) { // s is an alternative to u iff s doesn't have more words return filter(function(s) { return numWords(s) <= numWords(u) }, keys(meanings)); } var state_prior = Indifferent(states); var makeSpeaker = function(hearer) { return cache(function(observation, alternatives) { return Agent({ options: alternatives || keys(meanings), credence: update(state_prior, observation), utility: function(u,s){ return marginalize(learn(hearer, u), 'state').score(s) - cost(u); } }); }); }; var cost = function(u) { return u == '-' ? 10 : u.length/50; }; var access_prior = { 'full': 0.42, 'full_MA': 0.02, 'full_MB': 0.02, 'full_MC': 0.02, 'partial': 0.02, 'none': 0.5 }; var observationDist = function(match) { return function(state) { var observations = filter(function(obs) { obs.includes(state) && obs.length > 1 && obs.length < states.length }, powerset(match ? filter(function(s){ s[match] == state[match] }, states) : states)); return uniformDraw(observations); }; }; var get_observation = { 'full': function(state) { return state }, 'full_MA': observationDist('MA'), 'full_MB': observationDist('MB'), 'full_MC': observationDist('MC'), 'partial': observationDist(), 'none': function(state) { return states } }; var makeHearer = function(speaker) { return Agent({ credence: join({ 'state': state_prior, 'access': access_prior }), kinematics: function(utterance) { return speaker ? function(s) { var obs = evaluate(get_observation[s.access], s.state) var sp = speaker(obs, alternatives(utterance)) return bestOption(sp) == utterance; } : function(s) { return evaluate(meanings[utterance], s.state); }; } }); }; var hearer0 = makeHearer(); var speaker1 = makeSpeaker(hearer0); var hearer2 = makeHearer(speaker1); var speaker3 = makeSpeaker(hearer2); var hearer4 = makeHearer(speaker3); display("hearer4 hears 'may A or B or C':"); viz.table(learn(hearer4, 'may A or B or C'));

I'm now distinguishing six possibilities about the speaker's access to the state:

- The speaker has full access to what is permitted.
- The speaker knows whether A is permitted and has incomplete (or no) information about B and C.
- The speaker knows whether B is permitted and has incomplete (or no) information about A and C.
- The speaker knows whether C is permitted and has incomplete (or no) information about A and B.
- The speaker has some other incomplete information.
- The speaker has no information.

For the effect to arise, hearers must consider possibility 1 to be more likely than 2-5, and they must not consider 5 to be much more likely than 2-4. In other words, the hearer must assume that if the speaker has any information at all, then they probably have full information, or at least full information about one of the disjuncts.

This assumption is needed to ensure that 'May A or B' conveys 'Not May C'. For suppose a speaker knows that at least one of A and B is allowed, and has no information about C. Then she is in a position to assert 'May A or B', but not 'May A or C' or 'May B or C'. So 'May A or B' is the uniquely best option. If a hearer gives high probability to encountering such a speaker, he won't see 'May A or B' as indicating 'Not May C'.

The model is not as fragile as Franke's, but it is still not as robust as one might like. I had to hard-code quite specific assumptions about the speaker's access. Ideally, we would be able to derive these by some pragmatic mechanism. The effect also depends on a relatively specific cost function. Worse, the free-choice effect does not become stronger at levels beyond 4, as one might hope. I'm also not entirely happy about the derivation of the inference from '◇A' to ¬◇B, based on the "non-arbitrariness" requirement.

I suspect that most of these problems could be avoided if we relaxed the fixed-alternatives approach, perhaps in favour of a model with uncertainty about the costs, as I explained "here.

Bergen, Leon, Roger Levy, and Noah Goodman. 2016. “Pragmatic Reasoning Through Semantic Inference.” *Semantics and Pragmatics* 9: ACCESS–. "doi.org/10.3765/sp.9.20.

Champollion, Lucas, Anna Alsop, and Ioana Grosu. 2019. “Free Choice Disjunction as a Rational Speech Act.” *Semantics and Linguistic Theory*, 238–57. "doi.org/10.3765/salt.v29i0.4608.

Fox, Danny. 2007. “Free Choice and the Theory of Scalar Implicatures.” In *Presupposition and Implicature in Compositional Semantics*, edited by U. Sauerland and P. Stateva, 71–120. Basingstoke: Palgrave Macmillan.

Fox, Danny, and Roni Katzir. 2011. “On the Characterization of Alternatives.” *Natural Language Semantics* 19: 87–107.

Fox, Danny, and Roni Katzir. 2021. “Notes on Iterated Rationality Models of Scalar Implicatures.” *Journal of Semantics* 38 (4): 571–600. "doi.org/10.1093/jos/ffab015.

Kratzer, Angelika, and Junko Shimoyama. 2002. “Indeterminate Pronouns: The View from Japanese.” In *Proceedings of the 3rd Tokyo Conference on Psycholinguistics*, 1–25. Tokyo: Hituzi Syobo.

Singh, Raj, Ken Wexler, Andrea Astle-Rahim, Deepthi Kamawar, and Danny Fox. 2016. “Children Interpret Disjunction as Conjunction: Consequences for Theories of Implicature and Child Development.” *Natural Language Semantics* 24 (4): 305–52. "doi.org/10.1007/s11050-016-9126-3.

Let's back up a little.

"Lewis (1969) argued that linguistic conventions solve a game-theoretic coordination problem.

The coordination problem is easy to see in simple *signalling games*, where each state of the world calls for a particular action on the part of the hearer, but only the speaker can directly observe that state. The speaker can produce a number of signals, depending on what she observes. The hearer chooses a response based on the signal she receives. Speaker and hearer would like to coordinate their strategies so that the hearer ends up performing the appropriate action for each state.

Human languages are unlike simple signalling games in that there's usually no particular act a hearer is expected (or desired) to perform, in response to a given utterance. "Lewis (1969) therefore suggests that our linguistic conventions solve a different kind of coordination problem that arises only *among speakers*. Lewis was not happy with this conclusion, though. In "Lewis (1975), he brings hearers back into the picture, suggesting that their role is to "trust" the speakers.

It's not clear to me what kind of act "choosing an interpretation" is meant to be. Perhaps it's a doxastic "act": the speaker comes to believe that it is raining. Or perhaps it's a more indirect act: the speaker decides to act in whatever way would be appropriate if it were raining. Or perhaps it's an act of *accepting* that it is raining, so that this proposition becomes part of the common ground.

Let's set this issue aside for now.

Signalling games usually have many equilibria. Conventions are supposed to help. But it's not obvious how. What association between signals and states I should use as a speaker depends entirely on what association I think you will use as a hearer, which in turn depends entirely on what association you expect me to use, and so on. Where in this endless loop could a linguistic convention enter the picture?

Franke's answer is that when speakers and hearers replicate each other's reasoning, their replications become increasingly unsophisticated. The iterations terminate in a "level-0" player who simply acts in accordance with the conventional, literal meaning.

This is very similar to (what I take to be) the core idea behind the Rational Speech Act framework. The main difference is that Franke's IBR framework gives an active role to hearers. In the RSA framework, hearers simply update on what they hear. In the IBR framework, hearers choose an interpretation.

Here's an IBR model for the implicature from 'some' to 'not all'.

As usual, the question under discussion is whether some or all students passed. For simplicity, let's assume it is already known that at least one student passed. The speaker knows that not all students passed, and tries to get this across to the hearer. There are two relevant states, ∀ and ∃¬∀, and two relevant utterances, 'some' and 'all'. The literal meaning associates 'some' with both states and 'all' with the ∀ state.

A level-0 speaker would follow the basic convention to utter a (relevant) sentence iff it is true. Knowing that the true state is ∃¬∀, she would utter 'some'. Knowing that the state is ∀, she would randomly choose 'all' or 'some':

Level-0 speaker:

state message ∃¬∀ 'some' ∀ 'some' or 'all'

Now suppose you're a level-1 hearer who models his conversational partner as a (well-informed and cooperative) level-0 speaker. If you hear 'all', you can infer that the state is ∀. If you hear 'some', the state might be ∀ or ∃¬∀. Suppose your prior credence in the two possibilities is 1/2 each. By Bayes' Theorem, your posterior credence in ∀, after hearing 'some', is then 1/3, while your credence in ∃¬∀ is 2/3. But you have to choose an interpretation. Should you choose to interpret 'some' as ∃¬∀ or as ∀? Presumably, you choose the more likely interpretation:

Level-1 hearer:

message interpretation 'all' ∀ 'some' ∃¬∀

Next, consider a "level-2" speaker who models her conversational partner as a level-1 hearer. Evidently, she will use 'all' to convey ∀ and 'some' to convey ∃¬∀:

Level-2 speaker:

state message ∃¬∀ 'some' ∀ 'all'

We've reached equilibrium. A level-3 hearer would choose the same interpretation as the level-1 hearer, and so a level-4 speaker would make the same choice as the level-2 speaker, and so on.

We have derived the fact that people use 'some' to convey 'some and not all', even though the literal meaning of 'some' is compatible with 'all'.

But what's up with that level-1 hearer?

Suppose again that you're a (level-1) hearer who models their conversational partner as a level-0 speaker. You hear them say 'some'. You might become 2/3 confident that the state is ∃¬∀. Why would you then choose to interpret 'some' as ∃¬∀? This makes no sense, on any of the above ideas about what this choice could mean. If you had to bet on the state, you would *not* bet on ∃¬∀ at all odds. You would not behave as if ∃¬∀ were true. Nor would you be inclined to add ∃¬∀ to the common ground. You would simply be unsure about what the speaker meant, and therefore about the state of the world.

The RSA approach gets this right. In an RSA model of the above scenario, the level-1 hearer would arrive at certain credences about the state, and that's all he has to do. (You might think of this as a "mixed act", if you want.)

Level-1 hearer:

message interpretation 'all' ∀:1 'some' ∃¬∀:2/3, ∀:1/3

A level-2 speaker who wants to maximize the hearer's accuracy in the true state would still choose 'all' to convey ∀ and 'some' to convey ∃¬∀:

Level-2 speaker:

state message ∃¬∀ 'some' ∀ 'all'

A subsequent level-3 hearer could now be certain that 'all' means ∀ and 'some' means ∃¬∀. We reach the same equilibrium, but a little later in the recursion.

I suspect that this generalizes: one can replicate every IBR model by an RSA model in which the "de-probabilification" that occurs on the hearer side in the IBR model occurs at a subsequent speaker stage in the RSA model.

Let's now look at Franke's derivation of Free Choice effects – the main topic of "Franke (2011).

Suppose it is common knowledge that at least one of A and B is permitted. There are three possible states: A alone is permitted, B alone is permitted, and A and B are both permitted. Let's abbreviate these states as MA, MB, and MAB.

The available messages are '◇A', '◇B', and '◇(A ∨ B)'. The literal meaning of '◇A' is that A is permitted. The literal meaning of '◇B' is that B is permitted. The literal meaning of '◇(A v B)' is that at least one of A and B is permitted. We'd like to predict that '◇(A v B)' can be used to convey that *both* are permitted.

As before, a level-0 speaker chooses an arbitrary utterance provided that it is true.

Level-0 speaker:

state message MA '◇A' or '◇(A v B)' MB '◇B' or '◇(A v B)' MAB '◇A' or '◇B' or '◇(A v B)'

Now consider a level-1 hearer who models his conversational partner as a level-0 speaker. If the speaker says '◇A', the hearer can infer that the state is either MA or MAB. Given flat priors over the three states, the posterior probability of MA (given '◇A') will be 3/5, that of MAB 2/5. Hearing '◇B' should similarly make the hearer 3/5 confident in MB and 2/5 in MAB. '◇(A v B)' doesn't rule out any of the states, but it favours MA and MB over MAB: the former have posterior probability 3/8 each, the latter probability 2/8.

We're assuming the IBR framework, so the hearer has to choose an interpretation. He will presumably choose to interpret '◇A' as MA, '◇B' as MB, and '◇(A v B)' as either MA or MB:

Level-1 hearer:

message credence interpretation '◇A' MA:3/5, MAB:2/5 MA '◇B' MB:3/5, MAB:2/5 MB '◇(A v B)' MA:3/8, MB:3/8, MAB:2/8 MA or MB

We may note in passing that the choice of interpretation depends on the priors. If the hearer's prior credence in MAB is, say, 0.5 while his credence in MA is 0.3, he will interpret '◇A' as MAB. But let's assume the hearer's priors are flat.

Next we have a level-2 speaker who models her conversational partner as a level-1 hearer. If she knows that the state is MA, and she wants to get this across to the hearer, her best choice is to utter '◇A'. If she knows that the state is MB, her best choice is '◇B'. If she knows that the state is MAB, she has a problem. There's nothing she could say that would make the hearer believe that the state is MAB. Franke assumes that she will randomly choose a message.

level-2 speaker:

state message MA '◇A' MB '◇B' MAB '◇A' or '◇B' or '◇(A v B)'

Now imagine you're a (level-3) hearer who models his conversational partner as a level-2 speaker. If you hear '◇A', you can infer that the state is either MA or MAB. With flat priors, the former is more likely. If you hear '◇B', the most likely state is MB. If you hear '◇(A v B)', the state must be MAB. You'll choose the following interpretation:

level-3 hearer:

message credence interpretation '◇A' MA:3/4, MAB:1/4 MA '◇B' MB:3/4, MAB:1/4 MB '◇(A v B)' MAB:1 MAB

This is what we wanted to predict: '◇A' is interpreted as MA, '◇B' as MB, and '◇(A v B)' as MAB. A level-4 speaker will use the same association to communicate her information. We've reached equilibrium.

As above, I would complain that the supposed hearer choices are implausible and unmotivated. Imagine you're the level-1 hearer (with flat priors). You model the speaker as a level-0 speaker. You hear '◇A'. You become 60% confident that the state is MA. In what sense would you choose to interpret '◇A' as MA? Why would a subsequent level-2 speaker care about this choice?

As above, however, we can replicate the IBR model with a slower RSA model in which the hearer does not have to choose an interpretation.

var states = ['MA', 'MB', 'MAB']; var meanings = { '◇A': ['MA', 'MAB'], '◇B': ['MB', 'MAB'], '◇(A v B)': ['MA', 'MB', 'MAB'] }; var speaker0 = function(observation) { return Agent({ options: keys(meanings), credence: update(Indifferent(states), observation), utility: function(u,s){ return meanings[u].includes(s) ? 1 : -1; } }); }; var hearer_prior = Indifferent(states); // var hearer_prior = Credence({ MA: 0.3, MB: 0.2, MAB: 0.5 }); var hearer1 = Agent({ credence: hearer_prior, kinematics: function(utterance) { return function(state) { var speaker = speaker0(state); return sample(choice(speaker)) == utterance; } } }); showKinematics(hearer1, keys(meanings));

The level-0 speaker behaves as in the IBR model. The level-1 hearer arrives at the same credence as in the IBR model. He does not choose an interpretation. Here's the output of our simulation in table format:

Level-1 hearer:

message credence '◇A' MA:3/5, MAB:2/5 '◇B' MB:3/5, MAB:2/5 '◇(A v B)' MA:3/8, MB:3/8, MAB:2/8

You can see how the output depend on the priors by uncommenting the line `// var hearer_prior = Credence({ MA: 0.3, MB: 0.2, MAB: 0.5 });`

in source block #1. (But add the comment slashes back before you run the code blocks below.)

Next, we introduce a level-2 speaker who models the hearer as a level-1 hearer and wants him to have a high degree of belief in the true state.

// continues #1 var speaker2 = function(observation) { return Agent({ credence: Indifferent([observation]), options: keys(meanings), utility: function(u,s) { return learn(hearer1, u).score(s); } }); }; showChoices(speaker2, states);

At this stage, we have the same association between states and messages that we got at level 1 in the IBR model:

Level-2 speaker:

state message MA '◇A' MB '◇B' MAB '◇A' or '◇B'

Now imagine you're a level-3 hearer (with flat priors) who thinks he faces a level-2 speaker. If you hear '◇A', you should become 2/3 confident that the state is MA, and 1/3 that it is MAB. If you hear '◇B', you should become 2/3 confident that the state is MB, and 1/3 that it is MAB. What if you hear '◇(A v B)'? You'll be surprised. A level-2 speaker never utters '◇(A v B)'!

// continues #2 var hearer3 = Agent({ credence: hearer_prior, kinematics: function(utterance) { return function(state) { var speaker = speaker2(state); return sample(choice(speaker)) == utterance; } } }); showKinematics(hearer3, keys(meanings));

We need to settle how the level-3 hearer updates on '◇(A v B)', so that the speaker at level 4 can decide whether to utter it. Let's assume that if he hears the surprise message '◇(A v B)', the level-3 hearer simply retains his prior credence over states. The following code achieves this.

// continues #2 var hearer3 = Agent({ credence: hearer_prior, kinematics: function(utterance) { return function(state) { var speaker = speaker2(state); return utterance.includes('v') || sample(choice(speaker)) == utterance; } } }); showKinematics(hearer3, keys(meanings));

Here is the output in table form:

Level-3 hearer:

message credence '◇A' MA:2/3, MAB:1/3 '◇B' MB:2/3, MAB:1/3 '◇(A v B)' MA:1/3, MB:1/3, MAB:1/3

A level-4 speaker should obviously utter '◇A' if the state is MA, and '◇B' if it is MB. If the state is MAB, all options are equally good.

// continues #4 var speaker4 = function(observation) { return Agent({ credence: Indifferent([observation]), options: keys(meanings), utility: function(u,s) { return learn(hearer3, u).score(s); } }); }; showChoices(speaker4, states);

We now have the same association between states and messages that we got at level 2 in the IBR model:

Level-4 speaker:

state message MA '◇A' MB '◇B' MAB '◇A' or '◇B' or '◇(A v B)'

From here on, things go smoothly. A level-5 hearer will take '◇(A v B)' to be a sure sign of MAB. With flat priors, he will be inclined towards MA if he hears '◇A' and towards MB if he hears '◇B'. A level-6 speaker will therefore choose '◇A' in MA, '◇B' in MB, and '◇(A v B)' in MAB.

// continues #5 var hearer5 = Agent({ credence: hearer_prior, kinematics: function(utterance) { return function(state) { var speaker = speaker4(state); return sample(choice(speaker)) == utterance; } } }); var speaker6 = function(observation) { return Agent({ credence: Indifferent([observation]), options: keys(meanings), utility: function(u,s) { return learn(hearer5, u).score(s); } }); }; showChoices(speaker6, states);

We have replicated Franke's IBR model of Free Choice as an RSA model.

Unfortunately, this derivation of the Free Choice effect depends on many dubious assumptions.

For a start, the derivation only goes through on specific assumptions about the hearer's priors. If you play around with `hearer_prior`

in source block #1 (and then re-run source block #6), you can see that the derivation breaks down whenever the hearer's prior for MA is not exactly equal to that of MB.

The derivation also relies on a very specific assumption about how the level-3 hearer updates on the surprise message '◇(A v B)': that he updates by sticking to his priors. We could alternatively have assumed that the level-3 hearer models the speaker as a soft-maximizer (with possibly very high alpha), so that utterances of '◇(A v B)' are not absolutely impossible. We would then no longer predict the Free Choice effect.

The derivation further relies on the assumption that the level-4 speaker has no preference for simplicity: she is indifferent between '◇A' and '◇B' and '◇(A v B)' in state MAB, even though the last option is needlessly verbose. If she prefers the simpler options, the derivation breaks down.

Analogous problems affect Franke's IBR model. It, too, requires prior indifference between MA and MB. And it requires very specific and peculiar assumptions about the speaker at level 2. Remember that if this speaker is in state MAB, she knows that nothing she could say would get across the true state to the (level-1) hearer. Franke assumes that she randomly chooses between '◇A' and '◇B' and '◇(A v B)'. Any preference for simplicity would break the derivation. So would the availability of a fourth option – say, remaining silent. Why isn't this an option? If the speaker knows that each of the three available messages would cause a false belief, then why would she say anything at all?

On top of these problems, Franke's derivation inherits the general implausibility of IBR models, with their mysterious hearer choices.

As Franke notes, his derivation also only works for cases with exactly two disjuncts. You can confirm this (for the RSA version) by changing the `states`

and `meanings`

in source block #1 to the following and re-running source block #6:

var states = ['MA', 'MB', 'MC', 'MABC']; var meanings = { '◇A': ['MA', 'MABC'], '◇B': ['MB', 'MABC'], '◇C': ['MC', 'MABC'], '◇(A v B v C)': ['MA', 'MB', 'MC', 'MABC'] };

Overall, this doesn't look promising. In the next post, I'll try to do better.

Franke, Michael. 2011. “Quantity Implicatures, Exhaustive Interpretation, and Rational Conversation.” *Semantics and Pragmatics* 4: 1:1–82. "doi.org/10.3765/sp.4.1.

Lewis, David. 1969. *Convention: A Philosophical Study*. Cambridge (Mass.): Harvard University Press.

Lewis, David. 1975. “Languages and Language.” In *Language, Mind, and Knowledge*, edited by Keith Gunderson, VII:3–35. Minnesota Studies in the Philosophy of Science. Minneapolis: University of Minnesota Press.

"Goodman and Stuhlmüller (2013) consider a scenario in which a speaker wants to communicate how many of three apples are red. The hearer isn't sure whether the speaker has seen all the apples. Chapter 2 of "problang.org gives two models of this scenario. The first makes very implausible predictions. The second is very complicated. Here's a simple model that gives the desired results.

var states = ['RRR','RRG','RGR','GRR','RGG','GRG','GGR','GGG']; var meanings = { 'all': function(state) { return !state.includes('G') }, 'some': function(state) { return state.includes('R') }, 'none': function(state) { return !state.includes('R') }, '-': function(state) { return true } } var observation = function(state, access) { return filter(function(s) { return s.slice(0,access) == state.slice(0,access); }, states); } var hearer0 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return evaluate(meanings[utterance], state); } } }); var speaker1 = function(obs) { return Agent({ options: keys(meanings), credence: update(Indifferent(states), obs), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; showChoices(speaker1, [observation('RRR', 2), observation('GGG', 2)]);

I'll briefly pause here for an explanation. I assume that the speaker either has access to all three apples or only to the first two apples, or only to the first apple. The `observation`

function takes a state and an access level (1, 2, or 3) and returns the information a speaker would have about the state at the given access level. For example, for state 'RRG' and access level 2, the function returns the set { 'RRG', 'RRR' }. The level-1 speaker `speaker1`

is parameterized by some such information. If you run the code, you see what a level-1 speaker would say (a) if her information state is { 'RRG', 'RRR' }, and (b) if her information state is { 'GGG', 'GGR' }.

The level-2 hearer is unsure about the speaker's access level and performs a joint inference about the access level and the state:

// continues #1 var hearer2 = Agent({ credence: Indifferent(Cross({'state':states, 'access':[1,2,3]})), kinematics: function(utterance) { return function(s) { var obs = observation(s.state, s.access); return sample(choice(speaker1(obs))) == utterance; } } }); showKinematics(hearer2, keys(meanings))

The results make sense. For example, if the speaker says 'all', the hearer infers that she (the speaker) has full access to 'RRR'. If the speaker says 'some', she has seen at least one red apple and has not seen 'RRR'. And so on. The scalar inference from 'some' to 'not all' hasn't completely disappeared. It has only become weaker.

The second model on problang.org makes similar predictions. The first uses a naive "planning as inference" algorithm to compute the speaker's choice, like so:

// continues #1 var alpha = 10 var speaker1choice = function(obs) { return Infer(function() { var u = uniformDraw(keys(meanings)); var s = uniformDraw(states); condition(obs.includes(s)); var utility = learn(hearer0, u).score(s); factor(alpha * utility); return u; }); }; viz.table(speaker1choice(observation('RRG',2)))

In essence, this makes a speaker choose an utterance in proportion to how good the utterance would be in a possible state of full information. A speaker who only sees that the first two apples are red will strongly prefer 'all'.

Let's turn to a possibly more interesting example.

Recall the apples and orange juice scenario from "my post on scalar implicatures. I'll assume that it is conversationally relevant which juices are on offer. Let's also add one more option to the available utterances. Besides 'have apple', 'have orange', 'have apple and orange', and their negation, the speaker can choose 'have apple *or* orange':

var states = Cross('apple', 'orange'); // = [{apple: true, orange: true}, ...] var meanings = { 'have apple': function(state) { return state['apple'] }, 'not have apple': function(state) { return !state['apple'] }, 'have orange': function(state) { return state['orange'] }, 'not have orange': function(state) { return !state['orange'] }, 'have apple and orange': function(state) { return state['apple'] && state['orange'] }, 'have apple or orange': function(state) { return state['orange'] || state['apple'] }, 'have no juice': function(state) { return !state['apple'] && !state['orange'] } }; var hearer0 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return evaluate(meanings[utterance], state); } } }); var speaker1 = function(state) { return Agent({ options: keys(meanings), credence: Indifferent([state]), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; var hearer2 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return sample(choice(speaker1(state))) == utterance; } } }); showKinematics(hearer2, keys(meanings));

Predictably, the level-1 speaker never utters 'have apple or orange', no matter what state she has observed. Consequently, the level-2 hearer doesn't know what to think if he hears 'have apple or orange'. He can't conditionalize on an event with probability 0.

A real hearer, of course, would infer that the speaker is not fully informed (or not fully cooperative).

We've built informedness and cooperativity into the model: the hearer simulates the speaker as being certain of the true state and wanting to confer high accuracy towards that state to the hearer.

Let's add the possibility that the speaker may have limited information. Concretely, I'll assume that the hearer is unsure about which question the speaker knows the answer to.

// continues #4 var access = { 'apple? orange?': function(s) { return s }, 'apple?': function(s) { return s.apple }, 'orange?': function(s) { return s.orange }, 'apple or orange?': function(s) { return s.apple || s.orange } }; var speaker1 = function(observation) { return Agent({ options: keys(meanings), credence: update(Indifferent(states), observation), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; var hearer2 = Agent({ credence: join({ 'state': Indifferent(states), 'access': Categorical({ vs: keys(access), ps: [0.7, 0.1, 0.1, 0.1] }) }), kinematics: function(utterance) { return function(s) { var observation = cell(access[s.access], s.state, states); return sample(choice(speaker1(observation))) == utterance; } } }); showKinematics(hearer2, ['have apple', 'have apple and orange', 'have apple or orange']);

Here I've defined four questions to which the speaker might know the answer. A speaker who knows the answer to `'apple? orange?'`

is fully informed about the state. A speaker who only knows the answer to `'apple?'`

is fully informed about the availability of apple juice. And so on. The level-2 hearer is unsure about the state and the speaker's knowledge. `Categorical({ vs: keys(access), ps: [0.7, 0.1, 0.1, 0.1] })`

encodes the latter uncertainty. It defines a distribution over the four questions, giving probability 0.7 to the first (`'apple? orange?'`

) and 0.1 to each of the others.

This time, the level-1 speaker can utter 'have apple or orange'. She does so whenever (a) she only knows the answer to 'apple or orange?' and (b) that answer is positive. As a result, the level-2 hearer infers (a) and (b) from an utterance of 'have apple or orange'. We get an ignorance implicature.

We also see that 'have apple' no longer renders 'not have orange' certain. In the example, the level-2 hearer rather becomes 82% confident that there's no orange juice and 18% confident that there is but the speaker doesn't know.

It is often assumed that disjunctions have an exclusivity implicature – that 'A or B' implicates 'not both'. My model doesn't predict this.

In fact, it's hard to see how the exclusivity implicature could be derived from assumptions about rationality and cooperativity.

To be sure, a speaker who knew that A and B are both true should say 'A and B' rather than 'A or B'. If a fully informed speaker does not utter 'A and B', we can therefore infer that A and B are not both true. The problem is that a speaker who utters 'A or B' thereby reveals that she is *not* fully informed.

We could predict the implicature if we made strange assumptions about the priors – for example, if we assumed that the speaker is more likely to be informed about A and B if both are true than if both are false. But what could motivate such an assumption?

This is a potential problem for RSA models, and for neo-Gricean models more generally. (By contrast, exclusivity is easily predicted by grammatical theories of implicature, along the lines of "Chierchia, Fox, and Spector (2012).) Maria Aloni mentions the problem in her "SEP entry on disjunction. So I assume it is well-known. I don't know how people respond.

I'm not sure how serious the problem is, in part because I'm not sure about the robustness of exclusivity inferences, and in part because there might be non-obvious ways of deriving the implicature after all.

The apple and orange juice case illustrates an attractive feature of RSA models: we don't need a special "consistency check" to prevent the derivation of inconsistent implicatures.

A naive Gricean algorithm for computing scalar implicatures might say that if a speaker utters a sentence S instead of a stronger alternative S', the hearer may infer that the alternative is false. Assuming that a disjunction 'A or B' has each disjunct as an alternative, this algorithm would predict that one can infer the falsity of both A and B from an utterance of 'A or B', even though this is inconsistent with the literal meaning of the utterance.

"Spector (2006), "Fox (2007), and "Schwarz (2016) point out that Sauerland's algorithm can still lead to inconsistent inferences.

"Schwarz (2016) considers sentence (1):

(1)Al hired at least two cooks.

Like plain disjunctions, (1) triggers an ignorance implicature: the speaker doesn't know how many cooks Al hired.

Now suppose the alternatives to 'at least two' include 'at least three', 'at least four', 'exactly three', 'exactly four', etc. By Sauerland's algorithm, we first infer from an utterance of 'at least two' that the speaker doesn't know the stronger alternatives 'exactly two', 'at least three' etc. That is, the first step yields

(2)\(K[\geq2], \neg{}K[=\!2], \neg{}K[=\!3], \neg{}K[=\!4], \neg{}K[\geq3], \neg{}K[\geq4]\), etc.

In the second step we strengthen all these \(\neg{}K\phi\) facts to \(K\neg\phi\) provided the strengthening is consistent with (2). 'At least four' passes this test: if the open possibilities are {2,3} then all items in (2) are true, and so is \(K\neg[\geq4]\). 'Exactly three' also passes the test: if the open possibilities are {2,4} then all items in (2) are true, and so is \(K\neg[=\!3]\). But if we add both \(K\neg[\geq4]\) and \(K\neg[=\!3]\) to (2), we get a contradiction.

Schwarz concludes that neo-Gricean models require something stronger than Sauerland's consistency check: we need to check whether the hypotheses \(\phi\) for which we infer \(K\neg\phi\) are *innocently excludable*, in the sense of "Fox (2007).

Here is a simple model RSA model of the 'at least two' scenario, showing that no explicit check of innocent exclusion is needed.

var states = [1,2,3,4]; var meanings = { 'one': function(state) { return state >= 1 }, 'at least one': function(state) { return state >= 1 }, 'exactly one': function(state) { return state == 1 }, 'two': function(state) { return state >= 2 }, 'at least two': function(state) { return state >= 2 }, 'exactly two': function(state) { return state == 2 }, 'three': function(state) { return state >= 3 }, 'at least three': function(state) { return state >= 3 }, 'exactly three': function(state) { return state == 3 }, 'four': function(state) { return state >= 4 }, 'at least four': function(state) { return state >= 4 }, 'exactly four': function(state) { return state == 4 } }; var alternatives = { 'one': ['one', 'two', 'three', 'four'], 'two': ['one', 'two', 'three', 'four'], 'three': ['one', 'two', 'three', 'four'], 'four': ['one', 'two', 'three', 'four'], 'at least one': keys(meanings), 'exactly one': keys(meanings), 'at least two': keys(meanings), 'exactly two': keys(meanings), 'at least three': keys(meanings), 'exactly three': keys(meanings), 'at least four': keys(meanings), 'exactly four': keys(meanings) }; var hearer0 = Agent({ credence: Indifferent(states), kinematics: function(utterance) { return function(state) { return evaluate(meanings[utterance], state); } } }); var speaker1 = function(observation, options) { return Agent({ options: options, credence: update(Indifferent(states), observation), utility: function(u,s){ return learn(hearer0, u).score(s); } }); }; var hearer2 = Agent({ credence: join({ 'state': Indifferent(states), 'access': { 'full': 0.9, 'partial': 0.1 } }), kinematics: function(utterance) { return function(s) { var obs = s.access == 'full' ? s.state : [s.state, s.state+uniformDraw([-1,-2,1,2])]; var speaker = speaker1(obs, alternatives[utterance]); return sample(choice(speaker)) == utterance; } } }); showKinematics(hearer2, ['two', 'at least two'])

As you can see, the level-2 hearer infers that the speaker has partial access to the state when he hears 'at least two', and makes sensible inferences about that state.

What happens is this.

I assume that simple number words ('one', 'two') only have simple number words as alternatives, while all other available options have all options as alternatives. As in my model of plural NPs in the "post on scalar implicatures, the alternatives constrain the hearer's reconstruction of the speaker's reasoning: the hearer wonders why the speaker chose the observed utterance *from among its alternatives*.

Have a look at `speaker1`

:

// continues #6 showChoices(speaker1, [[2], [2,3]], [['one', 'two', 'three', 'four']]) showChoices(speaker1, [[2], [2,3]], [keys(meanings)])

Among the one-word options, `speaker1`

chooses 'two' if she knows that the state is 2, and also if she merely knows that the state is 2 or 3. Among all options, she chooses 'exactly two' if she knows that the state is 2 and either 'at least two' or 'two' if she merely knows that the state is 2 or 3.

`speaker1`

never utters 'two' or 'at least two' if she is fully informed. A level-2 hearer who assumes full informedness can still make sense of 'two', because 'two' is the best *among its alternatives* in knowledge state {2}.

The level-2 hearer is 90% confident that the speaker is fully informed. The remaining 10% of his credence goes to different states of partial information. For example, if the true state is 3, then the speaker's information state might be {3} (most likely) or {1,3}, {2,3}, {3,4}, or {3,5}. Realistically, there are many more ways of having partial information, but I don't think including them would affect the result.

(Incidentally, this is another example where two expressions, here 'two' and 'at least two', are predicted to have different effects, despite having the same literal meaning.)

The above models all make an implausible prediction. Suppose you have no strong prior views about my state of knowledge with respect to the three apples. Then I utter 'some of the apples are red'. I think you'd come to believe that I've probably seen all the apples and that some but not all the apples are red.

Or suppose you have no strong views about my state of knowledge with respect to how many cooks Al has hired. Then I utter 'Al has hired two cooks'. You would infer that I'm probably well-informed and that Al did not hire three cooks.

The above models don't predict this.

An utterance of 'two' seems to convey that the speaker is well-informed. How could this come about?

A natural idea is that it is a higher-order implicature. We've seen that 'at least two' implicates ignorance, while 'two' does not. Uninformed speakers should therefore prefer 'at least two', and informed speakers should prefer 'two', so as to avoid triggering a false ignorance implicature.

(I assume this kind of implicature has been studied, but I don't think I've come across it in the literature.)

The explanation I just gave seems to assume that the speaker cares not only about the hearer's accuracy concerning the state of the world, but also about their accuracy concerning the speaker's state of information with respect to the questions under discussion. This is a reasonable assumption.

Interestingly, we can predict the informedness implicature arising from 'two' even without the assumption. The above model with fixed alternatives does not predict it. If we switch to a model with uncertainty about the speaker's cost function, as in the "previous blog post, the effect appears.

var states = [1,2,3,4]; var meanings = { 'one': function(state) { return state >= 1 }, 'at least one': function(state) { return state >= 1 }, 'exactly one': function(state) { return state == 1 }, 'two': function(state) { return state >= 2 }, 'at least two': function(state) { return state >= 2 }, 'exactly two': function(state) { return state == 2 }, 'three': function(state) { return state >= 3 }, 'at least three': function(state) { return state >= 3 }, 'exactly three': function(state) { return state == 3 }, 'four': function(state) { return state >= 4 }, 'at least four': function(state) { return state >= 4 }, 'exactly four': function(state) { return state == 4 } }; var complexity = function(utterance) { return utterance.includes(' ') ? 3 : 1; } var makeHearer = function(speaker) { return Agent({ credence: join({ 'state': Indifferent(states), 'access': { 'full': 0.9, 'partial': 0.1 }, 'chattiness': Indifferent([0,1,2]) }), kinematics: speaker ? makeKinematics(speaker) : level0kinematics }); }; var makeKinematics = function(speaker) { return function(utterance) { return function(s) { var obs = s.access == 'full' ? s.state : [s.state, s.state+uniformDraw([-1,1])]; var sp = speaker(obs, s.chattiness); return sample(choice(sp)) == utterance; } } }; var level0kinematics = function(utterance) { return function(s) { return evaluate(meanings[utterance], s.state); } }; var makeSpeaker = function(hearer) { return function(observation, chattiness) { return Agent({ options: keys(meanings), credence: update(Indifferent(states), observation), utility: function(u,s) { var q = marginalize(learn(hearer, u), 'state').score(s); var c = (chattiness-2)*complexity(u)/3; return q + c; } }); } }; var hearer0 = makeHearer(); var speaker1 = makeSpeaker(hearer0); var hearer2 = makeHearer(speaker1); var speaker3 = makeSpeaker(hearer2); var hearer4 = makeHearer(speaker3); var speaker5 = makeSpeaker(hearer4); var hearer6 = makeHearer(speaker5); showKinematics(hearer6, ['two', 'at least two'])

I need a lot of speakers and hearers here, so I've defined a few helper functions to create them.

In outline, the effect arises as follows.

When a speaker says 'at least two', a relatively naive hearer can infer that the speaker does not have a strong preference for simplicity; otherwise she would have chosen the semantically equivalent 'two'. This, in turn, means that she would have said 'exactly two' if that had led to significantly greater hearer accuracy. It would have done so if the speaker's information state was { 2 }. So the speaker's information state is probably { 2,… }.

When the speaker says 'two', however, the same hearer can't rule out that the speaker has a strong preference for simplicity. If she does, she might not have said 'exactly two' even if her information state was { 2 }. So the speaker's information state might be { 2 } and it might be { 2,… }.

In sum, this hearer finds 2 more probable if he hears 'two' than if he hears 'at least two'. At the next level, a speaker with a slight preference for simplicity might therefore prefer 'two' over 'exactly two' if her information state is { 2 }, but prefer 'at least two' over 'two' if her information state is { 2,3 }.

The effect we've just seen for 'two' and 'at least two' might help shed light on the exclusivity implicature arising from 'A or B'.

Even though 'two' and 'at least two' are semantically equivalent, a speaker who merely knows that Al hired two or more cooks would use 'at least two', while a speaker who knows that Al hired exactly two cooks would use 'two'.

Now compare 'A or B' and 'A or B or both'. These are semantically equivalent. But the latter signals that the speaker's information state is compatible with \(A \wedge B\) (just as 'at least two' signals that the speaker's information state is compatible with [>2]).

A relatively naive hearer who encounters 'A or B or both' can reason that (a) the speaker does not have a strong preference for simplicity, and hence that (b) the speaker would probably have said 'A or B but not both' if they could rule out \(A \wedge B\); since she didn't say 'A or B but not both', it follows that (c) the speaker's information state is compatible with \(A \wedge B\). No such conclusion can be drawn from an utterance of 'A or B'. This initial asymmetry might get amplified at higher levels.

I've briefly tried to confirm this idea with a simulation, but I haven't been able to get it to work. I suspect that it should be relatively easy to predict the exclusivity implicature, however, if we assume that the speaker cares about the hearer's accuracy with respect to the speaker's information state.

Chierchia, Gennaro, Danny Fox, and Benjamin Spector. 2012. “Scalar Implicature as a Grammatical Phenomenon.” In *Semantics: An International Handbook of Natural Language Meaning*. de Gruyter.

Fox, Danny. 2007. “Free Choice and the Theory of Scalar Implicatures.” In *Presupposition and Implicature in Compositional Semantics*, edited by U. Sauerland and P. Stateva, 71–120. Basingstoke: Palgrave Macmillan.

Goodman, Noah D., and Andreas Stuhlmüller. 2013. “Knowledge and Implicature: Modeling Language Understanding as Social Cognition.” *Topics in Cognitive Science* 5 (1): 173–84. "doi.org/10.1111/tops.12007.

Sauerland, Uli. 2004. “Scalar Implicatures in Complex Sentences.” *Linguistics and Philosophy* 27 (3): 367–91.

Schwarz, Bernhard. 2016. “Consistency Preservation in Quantity Implicature: The Case of at Least.” *Semantics and Pragmatics* 9: 1-1-47. "doi.org/10.3765/sp.9.1.

Spector, Benjamin. 2006. “Aspects de La Pragmatique Des Opérateurs Logiques.” PhD thesis, Paris 7.

I've been thinking a little about this kind of way of arguing for additivity (thanks for the pointer!)

Here's one thing I've come up against. Suppose you've got your separable component propositions, and they're "relational", in the sense your treatment of Kant is. That is: we've got a "slot" for desert-and-happiness, with cells specifying a degree of happiness/unhappiness, and also whether the person is deserving or not.

One of the key technical assumptions you need is "restricted solvability". And roughly, that tells you that if you've got two complete propositions differing only in which component proposition they have in this slot, with different values, and then you've got some other complete proposition x whose value lies between them, then you can find a substitution for the component in the first pair that matches the value of x.

It's a sort-of continuity assumption.

Now, in something like the Kant case, continuity is going to come, intuitively, from varying degree of happiness. So suppose the pair we start with are both deserving people with different degrees of happiness. Then you'd hope to be able to match any intermediate value by finding an intermediate level of happiness for a deserving person. Seems okay.

But suppose you have a deserving person, and an undeserving person, each with different levels of happiness/unhappiness. Then it seems more substantive ethical assumption to assume that you can find some substitution for that component of overall value that will match any other realized intermediate value (picture a situation where the happiness/unhappiness of the deserving is always more important than the happiness/unhappiness of the undeserving, so the realizable values form two disconnected "islands").

So I think this assumption bears thinking about, if you're going to make the very general argument here (I was thinking of this in connection to arguments for the additivity of overall accuracy of a credence function, and that's also relational in this way---the accuracy of a given credence turns on whether the proposition which is its content is true or false. So I think that hidden in the solvability axiom is going to be a particular assumption about the way that accuracy-given-truth and accuracy-given-falsity relate).

Now, I wonder whether one can overcome this just by embedding the whole structure in a bigger one (filling in the gaps, as it were), deriving an additive representation for that, and then cutting back to the original. After all, solvability is a richness assumption, and embeddability often helps out for that in other cases.

The other technical assumptions look more innocent to me, fwiw---it seems like separability (/independence) and this restricted solvability are the two where the real action is.

]]>

Assume Alt(exh(AvB > C)) = { exh(A > C), exh(B > C) } = { (A > C) & ~(B > C), (B > C) & ~(A > C) }.

Then exh(exh(AvB > C)) = (AvB > C) & ~[(A > C) & ~(B > C)] & ~[(B > C) & ~(A > C)].

~[(A > C) & ~(B > C)] & ~[(B > C) & ~(A > C)] is equivalent to (A > C) <-> (B > C).

Assuming that AvB > C entails (A > C) v (B > C), it follows that exh(exh(AvB > C)) entails (but is not equivalent to) A > C and B > C.

In your examples, one or both of A and B in the antecedent is complex, which might introduce some more alternatives, but I'd expect the above entailment to remain.

As far as I can tell, this also gets (9)/(10) right.

]]>

Still assuming your calculation is correct, I have a query. Take scenario 1 in the paper (Both children are on the right and the seesaw is balanced) and evaluate the following:

(3a') If Blue or both of them were on the left, then the seesaw would be unbalanced.

In that scenario I feel like (3a') is false, because if Blue was on the left, the seesaw would be balanced! But if exh(exh(Av(A&B)>C)) is equivalent to A&B>C, then we should expect (3a') to be true, since if both children were on the left, it would indeed be unbalanced.

Perhaps this is also relevant. In the paper, exhaustifying Av(A&B) to obtain (A&-B)v(A&B) was forcing a dilemma on the friends of the exhaustification. They get the (3a)-(4a)-(4b) trio right, but then they could not get the (9a)-(10a)-(10b) trio right. I cannot tell whether double exhaustification is helping with that or not. Is it?

Also thank you so much for the discussion!]]>

Could you tell more how you thought double exhaustification would help?]]>