## RSA models of scalar implicature

In this post, we'll model different kinds of scalar implicature. I'll introduce several ideas and techniques that prove useful for other topics as well.

Let's begin with the textbook example, the inference from 'some' to 'not all' (for which Goodman and Stuhlmüller (2013) give an RSA-type explanation).

### 1."Some students passed"

A speaker wants to communicate the results of an exam. The available utterances are 'all students passed', 'some students passed', and 'no students passed'; for short: 'all', 'some', and 'none'. We can represent their meaning as functions from states to truth values:

var states = ['∀', '∃¬∀', '¬∃'];
var meanings = {
'all': function(state) { return state == '∀' },
'some': function(state) { return state != '¬∃' },
'none': function(state) { return state == '¬∃' }
};


Here is a level-0 hearer who interprets utterances literally:

// continues #1
var hearer0 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return evaluate(meanings[utterance], state);
// This returns the truth-value of utterance in state,
// in accordance with the meanings dictionary.
}
}
});
showKinematics(hearer0, ['all', 'some', 'none']);


As we'd expect, the level-0 hearer does not interpret 'some' as ∃¬∀.

Next, we have a level-1 speaker who models their addressee as a level-0 hearer. She knows the true state and tries to maximize the hearer's credence in that state.

// continues #2
var speaker1 = function(observation) {
return Agent({
options: ['all', 'some', 'none'],
credence: Indifferent([observation]),
utility: function(u,s){
return learn(hearer0,u).score(s);
}
});
};
showChoices(speaker1, states);


The level-1 speaker never uses 'some' if the state is ∀, because she can do better by using 'all'. As a result, any hearer above level 1 will infer ¬∀ from 'some':

// continues #3
var hearer2 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return sample(choice(speaker1(state))) == utterance;
}
}
});
showKinematics(hearer2, ['all', 'some', 'none']);


### 2.Soft-maxing

The inference from 'some' to 'not all' above is similar to the reasoning from 'Blue' to 'not Circle' in the squares-and-circle example from the previous post.

In the squares-and-circle scenario, 'Blue' is strictly less informative than 'Circle': 'Blue' is true in two states, Blue Square and Blue Circle; 'Circle' is only true in Blue Circle. If a (level-1) speaker uses the less informative message 'Blue', the (level-2) hearer infers that the more informative 'Circle' would be false. So 'Blue' is interpreted as Blue Square.

We might call this a contextual scalar implicature. 'Circle' doesn't generally entail 'Blue'; it only does so on the background of the contextual information that the only circle that could be drawn is blue.

There seems to be a difference between contextual and non-contextual scalar implicatures. If someone knows that all students passed, it would be very odd for them to utter 'some students passed'. By comparison, I would not be surprised if a speaker in the squares-and-circle scenario would describe a blue circle as 'Blue'.

The model from the previous post predicts that this never happens. And it surely would never happen in a community of perfectly rational agents with the shared goal of communicating the truth. But real people are not always that smart.

Can we make the model more realistic?

A standard way to do this (in cognitive science, and in the RSA literature) is to assume that instead of maximizing expected utility speakers are only soft-maxing expected utility: they choose an act with probability proportional to the act's exponentiated expected utility, so that

$\text{Pr}(A) \propto e^{\alpha\,\text{EU}(A)},$

where the α parameter determines the "degree of rationality". If it is infinite, the agent maximizes expected utility; if it is zero, the agent chooses randomly.

Here is the hearer-terminal squares-and-circle model again, up to level 2, but this time with a soft-maxing level-1 speaker. The only change in the code is that speaker1 now has an alpha parameter, which I've (arbitrarily) set to 1.

var states = ['Blue Square', 'Blue Circle', 'Green Square'];
var utterances = ['Blue', 'Green', 'Circle', 'Square'];
var is_true = function(utterance, state) {
return state.includes(utterance);
};
var hearer0 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return is_true(utterance, state);
};
}
});
var speaker1 = function(observation) {
return Agent({
options: utterances,
credence: Indifferent([observation]),
utility: function(u,s){
return learn(hearer0, u).score(s);
},
alpha: 1
});
};
var hearer2 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return sample(choice(speaker1(state))) == utterance;
}
}
});
showKinematics(hearer2, utterances);


As you can see, the effect has become much softer. When the level-2 hearer hears 'Blue', he merely becomes somewhat more confident in Blue Square than in Blue Circle, but Blue Circle is still a live possibility.

We could now run a study on mechanical turk, check how real people behave in the squares-and-circle scenario, and set the alpha parameter so that our model matches the observed behaviour. (Presumably, we'll need different values depending on the speaker's cognitive abilities, their state of alertness, intoxication, etc. One might think that the stakes also matter, but the soft-max model already takes this into account: if you change the utility function so that it returns 10 * learn(hearer0, u).score(s), the level-1 speaker almost always uses the most informative utterance.)

This seems to be the standard approach in the literature. It looks terribly dodgy to me, mainly because it seems to presuppose an independent grip on people's utility functions (on a ratio scale!). But let's not get into that.

### 3.Modularization

Return to the difference between contextual and non-contextual implicatures. I claimed that the inference from 'some' to 'not all' in the exam scenario is easy and automatic, unlike the inference from 'Blue' to 'not Circle' in the squares-and-circle scenario. If we give speakers a fixed alpha parameter, we can't explain this difference.

Computationally, it would make sense to have a dedicated cognitive module for computing common scalar implicatures that don't depend on contextual information.

One way to simulate this hypothesis is to assume that these "generalized" (Grice (1989)) or "default" (Levinson (2000)) implicatures are computed with higher alpha parameters and deeper levels of recursion than "non-generalized", contextual implicatures. The generalized implicatures might also use uninformed priors in the recursion, ignoring what the speaker or hearer is in fact likely to know. And they might be less sensitive to the speaker's state of alertness, etc.

In what follows, I'll mostly set alpha to infinity. This makes the results easier to interpret, although it makes the models less realistic.

### 4.QUD-Sensitivity

Let's look at another kind of scalar implicature, adapted from Carston (1998).

A speaker says 'We have apple juice'. In some contexts, you can infer that they don't have orange juice. In other contexts, you can't. It depends on the question under discussion (QUD). If the utterance is a response to 'What kind of juice do you have?', one may safely infer the unavailability of orange juice. If it is a response to 'Do you have apple juice?', one may not.

Whatever module computes the interpretation must be sensitive to the QUD.

We can represent a question as a function from states to answers, like so:

var states = Cross('apple', 'orange'); // = [{apple: true, orange: true}, ...]
var quds = {
'which?': function(state) { return state },
'apple?': function(state) { return state['apple'] }
};


The Cross function here creates a list of all possible assignments of truth values to 'apple' and 'orange'. The QUD 'which?' is the most fine-grained question that returns the full state. The QUD 'apple?' only returns a state's value for 'apple'.

What's the pragmatic effect of the QUD?

Kao et al. (2014) and Lassiter and Goodman (2017) suggest that the QUD affects a speaker's utility function.

So far, we've assumed that the speaker's goal is to increase the hearer's overall accuracy. But if a particular question is under discussion, the speaker's main goal is plausibly to convey information about that question, not about unrelated matters.

Let's implement this idea.

// continues #6
var meanings = {
'have apple': function(state) { return state['apple'] },
'not have apple': function(state) { return !state['apple'] },
'have orange': function(state) { return state['orange'] },
'not have orange': function(state) { return !state['orange'] },
'have apple and orange': function(state) { return state['apple'] && state['orange'] },
'have no juice': function(state) { return !state['apple'] && !state['orange'] }
};
var hearer0 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return evaluate(meanings[utterance], state);
}
}
});
var speaker1 = function(observation, qud) {
return Agent({
options: keys(meanings),
credence: Indifferent([observation]),
utility: function(u,s){
return marginalize(learn(hearer0, u), quds[qud]).score(evaluate(quds[qud],s));
}
});
};
var observed_state = { apple: true, orange: true };
display("QUD: which juice do you have?");
viz.table(choice(speaker1(observed_state, 'which?')));
display("QUD: do you have apple juice?");
viz.table(choice(speaker1(observed_state, 'apple?')));


speaker1 is now a function that takes an observation and a QUD as arguments, so that we can easily compare the predictions for different QUDs. The output shows what the speaker would do if apple and orange are both available. (The rather complex expression 'marginalize....' in the utility function computes the accuracy of hearer0 with respect to the QUD.)

As you can see, the level-1 speaker always utters 'have apple AND have orange' if the QUD is 'which?', but is indifferent between 'have apple' and 'have apple and orange' if the QUD is 'apple?'.

As a consequence, a level-2 hearer can infer the unavailability of orange from 'have apple' if the QUD is 'which?', but not if the QUD is 'apple?':

// continues #7
var hearer2 = function(qud) {
return Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return sample(choice(speaker1(state, qud))) == utterance;
}
}
});
};
display("QUD: which juice do you have? Utterance: have apple");
viz.table(learn(hearer2('which?'), 'have apple'));
display("QUD: do you have apple juice? Utterance: have apple");
viz.table(learn(hearer2('apple?'), 'have apple'));


### 5.Truthfulness and Simplicity

The code in block #7 correctly predicts the QUD-sensitive implicature, but the definition of speaker1 is too simplistic.

Here's one problem. If the QUD is 'do you have apple juice?', the speaker should prefer 'have apple' to 'have apple and orange'. Why mention the irrelevant orange? In general, a cooperative speaker should prefer simpler ways of conveying the desired information over more complex ways.

We can implement this by giving the speaker a preference for utterances with fewer words:

// continues #7
var speaker1 = function(observation, qud) {
return Agent({
options: keys(meanings),
credence: Indifferent([observation]),
utility: function(u,s){
var score_q = marginalize(learn(hearer0, u), quds[qud]).score(evaluate(quds[qud],s));
var score_m = -numWords(u)/100;
return score_q + score_m;
}
});
};
display("QUD: which juice do you have?");
viz.table(choice(speaker1(observed_state, 'which?')));
display("QUD: do you have apple juice?");
viz.table(choice(speaker1(observed_state, 'apple?')));


Here's another problem. Suppose only orange is available, and the QUD is 'apple?'. Our speaker1 is indifferent between 'have no juice' and 'not have apple':

// continues #9
var observed_state = { apple: false, orange: true };
viz.table(choice(speaker1(observed_state, 'apple?')));


If we replaced 'have no juice' with the shorter 'have none', she would even prefer the false 'have none' over the true 'not have apple'. To fix this, we should make our speaker care not only about the hearer's credence in the QUD, but also about not saying anything false.

// continues #7
var speaker1 = function(observation, qud) {
return Agent({
options: keys(meanings),
credence: Indifferent([observation]),
utility: function(u,s){
var score_q = marginalize(learn(hearer0, u), quds[qud]).score(evaluate(quds[qud],s));
var score_m = -numWords(u)/10;
var score_t = evaluate(meanings[u],s) ? 0 : -1000;
return score_q + score_m + score_t;
}
});
};
var observed_state = { apple: false, orange: true };
viz.table(choice(speaker1(observed_state, 'apple?')));


### 6.Asymmetric alternatives and higher-order implicatures

We derive scalar implicatures by considering alternatives to the chosen utterance: 'some' conveys 'not all' because a cooperative speaker should have chosen 'all', not 'some', to communicate that all students passed. So far, we've treated all utterances as available alternatives to one another. Let's model a case where we seem to need a more sophisticated notion of alternatives.

We're going to implement Spector (2007)'s account of plurality implicatures. The data we want to explain is that plural NPs, like 'pockets', are interpreted as plural in contexts like (1), but not in contexts like (2):

(1)This coat has pockets.
(2)This coat does not have pockets.

A tempting hypothesis is that plural nouns are semantically number-neutral and that the plural interpretation of (1) is a scalar implicature. The derivation would go something like this. If the coat had a single pocket, a cooperative speaker would have used (3) instead of (1).

(3)This coat has a pocket.

If the speaker chose (1) instead of (3), we can infer that the coat does not have a single pocket. Given the literal meaning of (1), it must have multiple pockets.

The problem is that (3) is plausibly also number-neutral: (3) is true even if the coat has multiple pockets. On our present assumptions, (1) and (3) are semantically equivalent! So why would a cooperative speaker prefer (3) to (1) in a single-pocket context?

Spector (2007) has a clever solution. While (3) is semantically number-neutral, he suggests, it implicates that the coat has a single pocket. The implicature in (1) can then be explained as a higher-order implicature, involving not the literal meaning of the alternative (3), but the literal meaning enriched with its implicature.

So why does (3) implicate that the coat has a single pocket? Because a cooperative speaker who utters (3) should have used (4) instead if the coat had multiple pockets:

(4)This coat has several pockets.

For this account to work, (4) must be an alternative to (3), and (3) to (1), even though (4) is not an alternative to (1).

Where should the alternatives enter the picture in our models? A plausible place is in the hearer's reconstruction of the speaker's choice: The hearer will wonder why the speaker uttered U rather than an alternative U'.

A level-0 hearer, of course, doesn't think about the speaker's choice at all. They simply conditionalize on the literal content of what they hear. In a hearer-terminal model, the alternatives will therefore enter the picture at level 2. Having heard an utterance U, the level-2 hearer will infer that no alternative U' to U would have been a better choice than U.

Let's set up the code. We distinguish three states and four utterances.

var states = ['0 pockets', '1 pocket', '2+ pockets'];
var meanings = {
'no pockets': function(state) { return state == '0 pockets' },
'pockets': function(state) { return state != '0 pockets' },
'a pocket': function(state) { return state != '0 pockets' },
'several pockets': function(state) { return state == '2+ pockets' }
}


Each utterance has a set of alternatives:

// continues #12
var alternatives = {
'pockets':         ['pockets', 'a pocket', 'no pockets'],
'a pocket':        ['pockets', 'a pocket', 'several pockets', 'no pockets'],
'several pockets': ['pockets', 'a pocket', 'several pockets', 'no pockets'],
'no pockets':      ['pockets', 'a pocket', 'several pockets', 'no pockets']
}


Again, the idea is that a hearer who encounters the utterance wonders why the speaker didn't choose one of the alternatives. (It proves convenient to include the original utterance in its set of alternatives.) The exact choice of alternatives is not crucial, but for Spector's solution we need 'a pocket' as an alternative to 'pockets', and 'several pockets' as an alternative to 'a pocket' but not to 'pockets'.

As always, a level-0 hearer conditionalizes on literal content:

// continues #13
var hearer0 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
return evaluate(meanings[utterance], state);
}
}
});
showKinematics(hearer0, keys(meanings));
// keys(meanings) is the set of all utterances, retrieved as the "keys" in the meanings dictionary.


As before, the level-1 speaker knows the true state and chooses an utterance, with the aim of getting the hearer to assign high credence to the true state.

// continues #14
var speaker1 = function(observation, options) {
return Agent({
options: options,
credence: Indifferent([observation]),
utility: function(u,s){
return learn(hearer0, u).score(s);
}
});
};
viz.table(choice(speaker1('2+ pockets', keys(meanings))))
viz.table(choice(speaker1('2+ pockets', ['pockets', 'a pocket'])))


I've defined speaker1 as a function of two arguments: the observation and a set of alternatives. The speaker chooses an optimal utterances from the alternatives.

As you can see (by running the code block), if the coat has multiple pockets and all utterances are available, the speaker always chooses 'several pockets'; if that utterance is unavailable, she is indifferent between 'a pocket' and 'pockets'.

Next, we define the level-2 hearer.

// continues #15
var hearer2 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
var speaker = speaker1(state, alternatives[utterance]);
return sample(choice(speaker)) == utterance;
}
}
});
showKinematics(hearer2, keys(meanings))


A difference has emerged between 'pockets' and 'a pocket': 'a pocket' is interpreted as 'a single pocket', while 'pockets' is interpreted as 'at least one pocket'. The level-2 hearer computes the first-order implicature from the choice of 'a pocket'. He doesn't yet compute the second-order implicature from the choice of 'pockets'. We'll get this implicature at higher-up levels.

The level-3 speaker prefers 'pockets' over 'a pocket' if the coat has multiple pockets:

// continues #16
var speaker3 = function(observation, options) {
return Agent({
options: options,
credence: Indifferent([observation]),
utility: function(u,s){
return learn(hearer2, u).score(s);
}
});
};
viz.table(choice(speaker3('2+ pockets', ['pockets', 'a pocket'])))


As a result, a level-4 hearer will interpret 'pockets' as 'several pockets':

// continues #17
var hearer4 = Agent({
credence: Indifferent(states),
kinematics: function(utterance) {
return function(state) {
var speaker = speaker3(state, alternatives[utterance]);
return sample(choice(speaker)) == utterance;
}
}
});
showKinematics(hearer4, keys(meanings))


Exercise: Change this hearer-terminal model into a speaker-terminal model. Surprisingly, hearer3 in that model already behaves like hearer4 in the hearer-terminal model, even though the recursion depth does not yet allow any higher-order implicatures. Can you see how the effect comes about?

Carston, Robyn. 1998. “Informativeness, Relevance and Scalar Implicature.” In Relevance Theory: Applications and Implications, edited by R. Carston and S. Uchida, 179–238. Amsterdam: Benjamins.
Goodman, Noah D., and Andreas Stuhlmüller. 2013. “Knowledge and Implicature: Modeling Language Understanding as Social Cognition.” Topics in Cognitive Science 5 (1): 173–84. doi.org/10.1111/tops.12007.
Grice, Paul. 1989. “Logic and Conversation.” In Studies in the Ways of Words. Cambridge (Mass.): Harvard University Press.
Kao, Justine T., Jean Y. Wu, Leon Bergen, and Noah D. Goodman. 2014. “Nonliteral Understanding of Number Words.” Proceedings of the National Academy of Sciences 111 (33): 12002–7. doi.org/10.1073/pnas.1407479111.
Lassiter, Daniel, and Noah D. Goodman. 2017. “Adjectival Vagueness in a Bayesian Model of Interpretation.” Synthese 194: 3801–36. doi.org/10.1007/s11229-015-0786-1.
Levinson, Stephen C. 2000. Presumptive Meanings. Cambridge (Mass): MIT Press.
Spector, Benjamin. 2007. “Aspects of the Pragmatics of Plural Morphology: On Higher-Order Implicatures.” In Presupposition and Implicature in Compositional Semantics, edited by Uli Sauerland and Penka Stateva, 243–81. Palgrave Studies in Pragmatics, Language and Cognition. London: Palgrave Macmillan UK. doi.org/10.1057/9780230210752_9.