## An RSA model of SDA

In this post, I'll develop an RSA model that explains why 'if A or B then C' is usually taken to imply 'if A then C' and 'if B then C', even if the conditional has a Lewis/Stalnaker ("similarity") semantics, where the inference is invalid.

I'll write 'A>C' for the conditional 'if A then C'. For the purposes of this post, we assume that 'A>C' is true at a world w iff all the closest A worlds to w are C worlds, by some contextually fixed measure of closeness.

### 1.Simplification as Free Choice?

It has often been observed that the simplification effect resembles the "Free Choice" effect, i.e., the apparent entailment of '◇A' and '◇B' by '◇(A∨B)', where the diamond is a possibility modal (permission, in the standard example). But there are also important differences.

According to standard modal semantics, '◇(A∨B)' is equivalent to '◇A ∨ ◇B'. But '(A∨B)>C' is not equivalent to '(A>C) ∨ (B>C)'. For example, suppose C is true at the closest A worlds but not at the closest B worlds, and the closest A∨B worlds are B worlds. Then 'A>C' is true, but '(A∨B)>C' is false.

In general, the truth-value of '(A∨B)>C' depends on three factors:

• whether the closest A worlds are C worlds,
• whether the closest B worlds are C worlds, and
• the relative closeness of A and B (i.e., whether the closest A worlds are closer than the closest B worlds or vice versa).

Nothing like the third factor is relevant for '◇(A∨B)'.

Franke (2011) seems to miss this point, when he says, on p.44, that his IBR model of Free Choice also explains Simplification of Disjunctive Antecedents (SDA).

I'm not going to go over Franke's model of Free Choice again. What's important is that it involves the following three states:

• tA, where A is permitted but B is not,
• tB, where B is permitted but A is not, and
• tAB, where both A and B are permitted.

We have the following association between these states and the truth-value of relevant messages:

'◇A' '◇B' '◇(A∨B)'
tA 1 0 1
tB 0 1 1
tAB 1 1 1

For conditionals, he says, the same kind of association holds, "provided we reinterpret the state names":

'(A>B)' '(B>A)' '(A∨B)>C'
tA 1 0 1
tB 0 1 1
tAB 1 1 1

This is table 86 on p.44. But how are we supposed to interpret these state names?

There is no interpretation that would make the table correct. The table makes it look as if the truth-value of '(A∨B)>C' is determined by the truth-values of 'A>C' and 'B>C'. But it is not. For example, what about a state in which 'A>C' is true, 'B>C' is false, and '(A∨B)>C' is false, because B is closer than A? This possibility is nowhere to be found in the table.

So Franke's IBR model of Free Choice does not, in fact, carry over to SDA.

(I would assume that this problem has been noticed before, but it isn't mentioned in Bar-Lev and Fox (2020) or Fox and Katzir (2021), where Franke's model is discussed. Am I missing something?)

Anyway, let's move on.

### 2.A simple model

As I said above, the truth-value of '(A∨B)>C' depends on

• whether the closest A worlds are C worlds,
• whether the closest B worlds are C worlds, and
• the relative closeness of A and B (i.e., whether the closest A worlds are closer than the closest B worlds or vice versa).

There are 12 possible combinations of these three factors. '(A∨B)>C' is true in five of them:

(S1)A is closer, A>C, ¬(B>C)
(S2)A is closer, A>C, B>C
(S3)B is closer, ¬(A>C), B>C
(S4)B is closer, A>C, B>C
(S5)A and B are equally close, A>C, B>C

Here, 'A is closer' means that the closest A worlds are closer than the closest B worlds, and 'A>C' means that the closest A worlds are C worlds.

Note that three of these five cases have both A>C and B>C. Imagine a speaker who thinks that their addressee has uniform priors over all twelve cases. Imagine the speaker knows A>C and B>C. Then '(A∨B)>C' is a already a better choice than, say, 'A>C' or 'B>C'. '(A>C) ∧ (B>C)' is better still, but if a higher-up hearer only compares the uttered message to its alternatives, we might expect to get an SDA effect, without any higher-order implicature.

This isn't quite right, though.

With uniform hearer priors, '(A∨B)>C' is a good option to convey A>C ∧ B>C, but it is also a good option to convey other states. In particular, it is the best option (at level 1) among its alternatives for conveying S1 and S3. That's because 'A>C' and 'B>C' (and their negations) are each true in six states and thus confer lower probability to S1 and S3 than '(A∨B)>C' would.

(Incidentally, this is why Franke's model doesn't work for SDA: '(A∨B)>C' is not a "surprise message" at level 2.)

Here's a simulation that confirms these claims:

var states = Cross({ closest: ['A', 'B', 'A,B'], Cness: ['A','B','A,B','-'] })
// C-ness 'A' means that the closest A worlds are C worlds
var meanings = {
'A>C': function(s) { return s['Cness'].includes('A') },
'B>C': function(s) { return s['Cness'].includes('B') },
'AvB>C': function(s) {
return s['closest'] == 'A' && s['Cness'].includes('A') ||
s['closest'] == 'B' && s['Cness'].includes('B') ||
s['closest'] == 'A,B' && s['Cness'] == 'A,B'
},
'A>C and B>C': function(s) {
return s['Cness'].includes('A') && s['Cness'].includes('B');
},
'-': function(s) { return true }
};
var alternatives = {
'A>C': ['A>C', 'B>C', '-'],
'B>C': ['A>C', 'B>C', '-'],
'AvB>C': ['A>C', 'B>C', 'AvB>C', '-'],
'A>C and B>C': keys(meanings),
'-': ['-']
}
var state_prior = Indifferent(states);
var hearer0 = Agent({
credence: state_prior,
kinematics: function(utterance) {
return function(state) {
return evaluate(meanings[utterance], state);
}
}
});
var speaker1 = function(observation, options) {
return Agent({
options: options || keys(meanings),
credence: update(state_prior, observation),
utility: function(u,s){
return learn(hearer0, u).score(s);
}
});
};
display('hearer0 -- A>C is compatible with six states, AvB>C with five:');
showKinematics(hearer0, ['A>C', 'AvB>C']);
var s1 = { closest: 'A', Cness: 'A' };
var s2 = { closest: 'A', Cness: 'A,B' };
display('speaker1 -- prefers AvB>C if she knows the state is S1 or S2');
showChoices(speaker1, [s1, s2], [alternatives['AvB>C']]);


To derive the Simplification effect, we need to ensure that speakers don't use '(A∨B)>C' to convey S1 or S3.

There are different ways to achieve this. I'm going to invoke a QUD.

Recall, once more, that the truth-value of '(A∨B)>C' is determined by the truth-value of 'A>C' and 'B>C' and the relative closeness of A and B. Normally, however, we don't expect that speakers who utter '(A∨B)>C' are trying to convey anything about the relative closeness of A and B. What's normally under discussion is whether A>C and whether B>C, not which of A and B is closer.

So let's add a QUD to the model, as in this post. Normally, the QUD is whether A>C and whether B>C. '(A∨B)>C' is then no longer a good option for a speaker who knows that the state is S1 or S3: 'A>C' is better in S1, 'B>C' is better in S3.

With this QUD, '(A∨B)>C' is the best option among its alternatives only in three of the 12 possible states: in S2, S4, and S5. In each of these, we have A>C and B>C. If the level-2 hearer assumes that the speaker chose the best option from among the alternatives of the chosen utterance, he will infer from an utterance of '(AvB)>C' that 'A>C' and 'B>C' are both true:

// continues #1
var quds = {
'state?': function(state) { return state },
'A>C?B>C?': function(state) { return state['Cness'] }
};
var makeHearer = function(speaker, state_prior, qud) {
return Agent({
credence: state_prior,
kinematics: function(utterance) {
return speaker ? function(s) {
var speaker = speaker(s, alternatives[utterance], qud);
return sample(choice(speaker)) == utterance;
} : function(s) {
return evaluate(meanings[utterance], s);
}
}
});
};
var makeSpeaker = function(hearer, state_prior, qud, cost) {
return function(observation, options) {
return Agent({
options: options || keys(meanings),
credence: update(state_prior, observation),
utility: function(u,s){
var qu = quds[qud];
return marginalize(learn(hearer, u), qu).score(qu(s)) - cost(u);
}
});
};
};
var cost = function(utterance) {
return utterance == '-' ? 2 : utterance.length/20;
};
var qud = 'A>C?B>C?';
var hearer0 = makeHearer(null, state_prior, qud);
var speaker1 = makeSpeaker(hearer0, state_prior, qud, cost);
var hearer2 = makeHearer(speaker1, state_prior, qud);
showKinematics(hearer2, ['AvB>C']);


('Cness: "A,B"' means that the closest A worlds and the closest B worlds are both C worlds.)

I've defined this simulation with factory functions so that one can easily create more agents and check different parameters. For example, if you change qud to 'state?', the level-2 hearer doesn't become convinced of A>C and B>C.

### 3.The Spain puzzle

I've assumed that the SDA effect arises because the relative closeness of A and B is normally not under discussion when we evaluate '(A∨B)>C'.

This might shed light on a puzzle about the distribution of SDA.

As McKay and Van Inwagen (1977) pointed out, there appear to be cases in which '(A∨B)>C' does not convey A>C and B>C. Their classic example is (1):

(1)If Spain had fought with the Axis or the Allies, it would have fought with the Axis.

A speaker who utters (1) would not be interpreted as believing that Spain would have fought with the Axis if it had fought with the Allies (even though it is theoretically possible to fight on both sides).

Similarly for (2), from Lassiter (2018):

(2)If Spain had fought with the Axis or the Allies, it would probably have fought with the Axis.

Why don't we get SDA here?

We might, of course, say that the inference is cancelled due to the implausibility of the conclusion. But perhaps we can say more.

Clearly, when somebody utters (1) or (2), the relative closeness of the two possibilities is under discussion. The point of (1) is precisely to state that Spain joining the Allies is a more remote possibility than Spain joining the Axis.

In the context of (1) and (2), then, the QUD is not 'A>C?B>C?'. Perhaps it is 'state?', or perhaps it is which of A and B is closer. The above model predicts that this breaks the derivation of SDA.

The hypothesis that SDA depends on the QUD is supported by the following observation, due to Nute (1980).

Consider (3):

(3)If Spain had fought with the Axis or the Allies, Hitler would have been happy.

In a normal context, (3) conveys that Hitler would have been happy no matter which side Spain had fought on, which is false. So here the SDA effect is in place. Now Nute observes that (3) can become acceptable if it is uttered right after (1).

A similar point could be made with (4):

(4)If Spain had fought with the Axis or the Allies, Hitler would probably have been happy, for surely Spain would have chosen the Axis.

The 'for surely' explanation in (4) clarifies that relative remoteness is under discussion, so that SDA isn't licensed. Likewise, if (3) is uttered right after (1), the relative remoteness question that is raised by (1) is still in place,

(Can we explain why (1), (2), and (4) make the relative remoteness of A and B salient? Presumably the explanation is that these sentences would be infelicitous if the relative remoteness were irrelevant, so it becomes relevant by accommodation. Might be useful to write a simulation for this.)

### 4.Misgivings

I don't like the above model.

I'm not sure why. I think it's because the inference is driven by quantitative likelihood comparisons – for example, that A>C ∧ A>C holds in 5/12 cases where '(AvB)>C' is true, as opposed to 6/12 where 'A>C' is true. Is our language faculty really sensitive to these quantitative differences?

The likelihood dependence also means that the inference only works for certain kinds of state priors.

I've assumed that the state prior is uniform. But the most striking examples of Simplification are cases like (3), where one disjunct is clearly more remote than the other.

(3)If Spain had fought with the Axis or the Allies, Hitler would have been happy.

The above model runs into trouble here.

If it is common knowledge that A is closer than B, then '(AvB)>C' is semantically equivalent to 'A>C'. A hearer should be puzzled why the speaker would use the needlessly complex '(AvB)>C'.

The problem doesn't just arise if it is certain that A is closer than B. Here is a prior according to which it is almost certain that A is closer than B:

// continues #2
var state_prior = update(Indifferent(states), { closest: 'A' }, { new_p: 0.99 });
viz.table(state_prior);


(The call to update Jeffrey-conditionalizes the uniform prior on the information that A is closer than B, with a posterior probability of 0.99.)

With this prior, a fully informed speaker would never utter '(AvB)>C' if the QUD is 'A>C?B>C?':

// continues #3
var hearer0 = makeHearer(null, state_prior, 'A>C?B>C?');
var speaker1 = makeSpeaker(hearer0, state_prior, 'A>C?B>C?', cost);
var hearer2 = makeHearer(speaker1, state_prior, 'A>C?B>C?');
showKinematics(hearer2, ['AvB>C']);


This isn't a decisive objection. One might argue that the computation of SDA is insulated from the worldly knowledge that A is closer than B. One could also argue that a hearer might be unsure about whether the speaker intrinsically prefers uttering 'A>C' over the slightly more complex '(AvB)>C'. We can still predict SDA if there's no preference for simpler utterances:

// continues #4
var no_cost = function(utterance) { return 0 };
var speaker1 = makeSpeaker(hearer0, state_prior, 'A>C?B>C?', no_cost);
var hearer2 = makeHearer(speaker1, state_prior, 'A>C?B>C?');
showKinematics(hearer2, ['AvB>C']);


But let's try a different approach.

### 5.A Free Choicy model

In section 1, I emphasized some differences between SDA and Free Choice.

In particular, '(AvB)>C' is not (literally) equivalent to 'A>C ∨ B>C', whereas '◇(A∨B)' is equivalent to '◇A ∨ ◇B'.

Still, '(AvB)>C' entails 'A>C ∨ B>C'. A literal-minded speaker would therefore only utter '(AvB)>C' if she knows that at least one of A>C and B>C obtains.

Let's assume that the speaker has a preference for simpler utterances, that A worlds are likely to be closer than B worlds, as in the prior from source block #3, and that the relative closeness of A and B is not under discussion. As we saw in simulation #4, a literal-minded speaker who is fully informed about the state would then always prefer 'A>C' or 'B>C' over '(AvB)>C'.

What if the speaker isn't fully informed? Suppose all she knows is that at least one of A>C and B>C obtains. In that case, 'A>C' and 'B>C' would be bad. '(AvB)>C' would be better. The speaker doesn't know that it is true, but with respect to the QUD it wouldn't communicate anything false.

Or suppose the speaker knows that either A is closer than B and A>C holds, or B is closer than A and B>C holds. In this case, she knows that '(AvB)>C' is true, without knowing that 'A>C' is true or that 'B>C' is true.

In sum, a literal-minded speaker would prefer '(AvB)>C' among its alternatives iff (i) she knows that at least one of A>C and B>C obtains, and (ii) she lacks a certain kind of further information.

Imagine a hearer who believes himself to be addressed by such a speaker. Hearing '(AvB)>C', he could infer (i) and (ii).

This is analogous to what the hearer would infer from an utterance of '◇(A∨B)' in the case of Free Choice. It doesn't involve any frequency comparisons.

What would the hearer infer from 'A>C'? Intuitively, he should be able to infer that 'B>C' is false. Recall that the QUD is whether A>C and whether B>C. There seems to be a general mechanism by which, if the QUD is whether X and whether Y, and a speaker says X, one can infer that ¬Y.

So let's assume that 'A>C' would prompt an inference to ¬(B>C). This is analogous to the inference from '◇A' to ¬◇B.

Now imagine a higher-level speaker who thinks that he is addressing such a hearer. Imagine she knows that A>C and B>C both obtain. Uttering 'A>C' would be bad, as it would convey ¬(B>C). Uttering 'B>C' would be equally bad. '(AvB)>C' would be better. It would be the best option among its alternatives.

As a result, a hearer on the next level who presumes that the speaker is well-informed would regard '(AvB)>C' as indicating A>C and B>C.

On this model, the derivation of SDA really is a lot like the derivation of Free Choice in the previous post.

Let's write a simulation to check that it works.

We need to allow for imperfectly informed speakers. But we have 12 possible states now. This means that there are 212-1 = 4095 ways to be informed or uninformed. If we consider all possibilities, the simulation becomes painfully slow.

To speed things up, I'll only consider six kinds of speaker information:

1. The speaker is fully informed.
2. The speaker is fully informed about which of A and B is closer, but lacks any information about A>C and B>C.
3. The speaker is fully informed about A>C and B>C, but lacks information about which of A and B is closer.
4. The speaker knows whether at least one of 'A>C' and 'B>C' is true.
5. The speaker knows whether '(A∨B)>C' is true.
6. The speaker knows nothing.

// continues #5
var access = {
// maps states to observations
'full': function(s) { return s },
'closest': function(s) { return { closest: s.closest } },
'Cness': function(s) { return { Cness: s.Cness } },
'A>CvB>C': function(s) {
return s.Cness == '-' ? { Cness: '-' } : function(t) { return t.Cness != '-' }
},
'AvB>C': function(s) {
var tv = evaluate(meanings['AvB>C'], s);
return function(t) { return evaluate(meanings['AvB>C'], t) == tv };
},
'none': function(s) { return states }
}
var access_prior = { 'full': 0.4, 'closest': 0.03, 'Cness': 0.03, 'A>CvB>C': 0.02, 'AvB>C': 0.02, 'none': 0.5 };


As in earlier posts, I assume a default presumption that the speaker is fully informed.

// continues #6
var makeHearer = function(speaker, state_prior, qud) {
return Agent({
credence: join({
'state': state_prior,
'access': access_prior
}),
kinematics: function(utterance) {
return speaker ? function(s) {
var obs = evaluate(access[s.access], s.state);
var speaker = speaker(obs, alternatives[utterance], qud);
return sample(choice(speaker)) == utterance;
} : function(s) {
return evaluate(meanings[utterance], s.state);
}
}
});
};
var makeSpeaker = function(hearer, state_prior, qud, cost) {
return function(observation, options) {
return Agent({
options: options || keys(meanings),
credence: update(state_prior, observation),
utility: function(u,s){
var qu = quds[qud];
var hearer_state_credence = marginalize(learn(hearer, u), 'state');
return marginalize(hearer_state_credence, qu).score(qu(s)) - cost(u);
}
});
};
};

var state_prior = update(Indifferent(states), { closest: 'A' }, { new_p: 0.99 });
// var state_prior = Indifferent(states);
var qud = 'A>C?B>C?';
// var qud = 'state?';
var hearer0 = makeHearer(null, state_prior, qud);
var speaker1 = makeSpeaker(hearer0, state_prior, qud, cost);
var hearer2 = makeHearer(speaker1, state_prior, qud);
var speaker3 = makeSpeaker(hearer2, state_prior, qud, cost);
var hearer4 = makeHearer(speaker3, state_prior, qud);


(You can see how the effect depends on the state prior and the QUD by uncommenting var state_prior = Indifferent(states); or var qud = 'state?';. With a uniform state prior, we would get SDA by the same mechanism as in section 2.)

// continues #7
display('hearer2:');
showKinematics(hearer2, ['A>C', 'AvB>C']);


As predicted, upon hearing 'AvB>C', the level-2 hearer infers that (i) at least one of A>C and B>C obtains, and that (ii) the speaker lacks information.

Upon hearing 'A>C', the level-2 hearer only has a slight tendency to think that 'B>C' is false.

To get the desired effect, 'AvB>C' should be a better choice for communicating A>C ∧ B>C than 'A>C' and 'B>C', at the next level up. In the case of Free Choice, a slight tendency to infer that '◇B' is false based on an utterance of '◇A' was not enough, because ◇A ∧ ◇B was even more unlikely conditional on '◇(A∨B)'. In the present case, 'AvB>C' turns out to yield a comparatively high credence of around 40% in A>C ∧ B>C. This is enough to derive SDA:

// continues #8
display('hearer4:');
showKinematics(hearer4, ['AvB>C']);


Like the first model, this model relies on subtle likelihood comparisons, and therefore on specific assumptions about the priors. For example, the derivation doesn't work in a painfully slow model that treats all ways of being uninformed as equally likely:

// continues #8
var access_prior = { 'full': 0.45, 'partial': 0.05, 'none': 0.5 };
var get_observation = {
'full': function(state) { return state },
'partial': function(state) {
// return uniform distribution over all partial observations compatible with state
var observations = filter(function(obs) {
obs.includes(state) && obs.length > 1 && obs.length < states.length
}, powerset(states));
return uniformDraw(observations);
},
'none': function(state) { return states }
};
var makeHearer = function(speaker, state_prior, qud) {
return Agent({
credence: join({
'state': state_prior,
'access': access_prior
}),
kinematics: function(utterance) {
return function(s) {
var obs = evaluate(get_observation[s.access], s.state);
var speaker = speaker(obs, alternatives[utterance], qud);
return sample(choice(speaker)) == utterance;
}
}
});
};
var hearer2 = makeHearer(speaker1, state_prior, qud);
showKinematics(hearer2, ['A>C', 'AvB>C']);


Here, A>C ∧ B>C has a slightly greater credence under 'A>C' than under 'AvB>C', so the level-3 speaker would prefer 'A>C' to communicate A>C ∧ B>C, and we won't get an SDA effect.

A better model would make sure that 'A>C' strongly conveys ¬(B>C). The non-arbitrariness requirement from my previous post crudely serves this purpose:

// continues #8
var makeHearer = function(speaker, state_prior, qud) {
return Agent({
credence: join({
'state': state_prior,
'access': access_prior
}),
kinematics: function(utterance) {
return function(s) {
var obs = evaluate(access[s.access], s.state);
var speaker = speaker(obs, alternatives[utterance], qud);
return bestOption(speaker) == utterance;
}
}
});
};
var hearer2 = makeHearer(speaker1, state_prior, qud);
display('hearer2:');
showKinematics(hearer2, ['A>C', 'AvB>C']);
var speaker3 = makeSpeaker(hearer2, state_prior, qud, cost);
var hearer4 = makeHearer(speaker3, state_prior, qud);
display('hearer4:');
showKinematics(hearer4, ['AvB>C']);


Bar-Lev, Moshe E., and Danny Fox. 2020. “Free Choice, Simplification, and Innocent Inclusion.” Natural Language Semantics 28 (3): 175–223. doi.org/10.1007/s11050-020-09162-y.
Fox, Danny, and Roni Katzir. 2021. “Notes on Iterated Rationality Models of Scalar Implicatures.” Journal of Semantics 38 (4): 571–600. doi.org/10.1093/jos/ffab015.
Franke, Michael. 2011. “Quantity Implicatures, Exhaustive Interpretation, and Rational Conversation.” Semantics and Pragmatics 4: 1:1–82. doi.org/10.3765/sp.4.1.
Lassiter, Daniel. 2018. “Complex Sentential Operators Refute Unrestricted Simplification of Disjunctive Antecedents.” Semantics and Pragmatics 11: 9:EA–. doi.org/10.3765/sp.11.9.
McKay, Thomas, and Peter Van Inwagen. 1977. “Counterfactuals with Disjunctive Antecedents.” Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition 31 (5): 353–56. doi.org/10.1007/BF01873862.
Nute, Donald. 1980. “Conversational Scorekeeping and Conditionals.” Journal of Philosophical Logic 9 (2): 153–66. doi.org/10.1007/BF00247746.