Natural Languages as ε-Ambiguity Languages

In this post, we examines the claim that natural languages are ε-ambiguity languages in the sense defined by the probabilistic theories of language and latent-intention inference in (Jiang, 2023). This article surveys linguistic, psycholinguistic, and computational evidence demonstrating that natural languages exhibit precisely this structure.

The ε-Ambiguity Framework

Natural languages support reliable communication despite variability, noise, and structural underspecification. Unlike programming languages, they allow metaphor, ellipsis, ambiguity, deixis, and context-dependent meaning. Yet humans typically recover the intended meaning with high accuracy.

Recent theoretical work introduces ε-ambiguity languages as a formal tool to model this phenomenon. Under this framework, a language is ε-ambiguous if, for any meaningful message x, there exists a dominant intended meaning θ₀ such that

\[\Pr(\theta_0 \mid x) \ge 1 - \varepsilon(x), \quad \varepsilon(x) \in [0,1),\]

but alternative interpretations occur with small but non-zero probability.

In this framework, although many meanings are technically compatible with a linguistic expression, one meaning dominates the posterior probability, and ambiguity occurs only with small probability ε(x). In , it argues that natural languages empirically exhibit ε-ambiguity, namely they are neither perfectly unambiguous nor fully ambiguous, but instead allow for reliably dominant meanings with bounded ambiguity. Furthermore, argues that ε-ambiguity provides a coherent explanation for both human semantics and LLM emergent abilities.

Model Definition

As in , we assume:

A latent intention space Θ,
A surface linguistic expression x generated via a distribution q(x ∣ θ) with θ ∈ Θ.
A listener infers meaning via the posterior probability Pr(θ ∣ x).

A language is an ε-ambiguity language if for every meaningful expression x:

\[\Pr(\theta_0 \mid x) \ge 1 - \varepsilon(x),\]

where θ₀ is the intended meaning, and ε(x) quantifies residual ambiguity.

This describes a sparse posterior over meanings: a dominant intention and a long but very small tail of alternatives.

Consequences

This model predicts that:

Communication is reliable but not deterministic.
Ambiguity decreases multiplicatively when multiple cues or messages are provided.
Latent-intention inference is feasible even without explicit symbolic structure.

The fundamental principle behind ε-ambiguity languages is that linguistic expressions exhibit partial but not perfect semantic determinacy. Messages tend to convey one meaning with high probability, but never with absolute certainty. Real-world natural languages possess exactly these properties: they support efficient communication despite intrinsic ambiguity, and ambiguity is controlled by contextual, semantic, and pragmatic mechanisms.

The ε-ambiguity framework formalizes this intuition within a probabilistic generative model of communication, where meanings (latent intentions θ) are drawn from a space Θ, and surface messages are generated by noisy processes with intention-specific distributions q(x ∣ θ). The framework provides a mathematical explanation for why LLMs can infer hidden meanings from text and why phenomena such as chain-of-thought reasoning reduce uncertainty.

The central question of this article is:

What empirical and theoretical evidence supports the view that natural languages satisfy the definition of ε-ambiguity languages?

We demonstrate that evidence comes from multiple research domains.

Evidence from Linguistics

Lexical Ambiguity and Polysemy

Natural languages contain extensive lexical ambiguity . Words often possess multiple senses (e.g., bank, seal, interest), yet human speakers reliably infer the dominant meaning from context. Corpus-based studies of word sense disambiguation have shown that sense distributions are often highly skewed, with the most frequent sense accounting for a large majority of occurrences .

This illustrates exactly the condition:

\[\Pr(\theta_0 \mid x) \approx 1 - \varepsilon(x), \quad \varepsilon(x) \text{ small but nonzero},\]

where θ₀ is the dominant sense.

Syntactic Ambiguity

Classic syntactic ambiguities (e.g., “I saw the man with the telescope”) allow multiple parses, yet listeners overwhelmingly adopt one interpretation when context is provided. Probabilistic grammars assign steeply skewed probability distributions to parses , again demonstrating nonzero but concentrated posterior distributions over intentions.

Pragmatic Inference and Speech Acts

Pragmatics often shifts literal meanings to intended meanings . For instance:

“Can you pass the salt?” is interpreted as a request, not a question about capability.

Research on speech-act recognition shows that listeners infer intended acts with high reliability but occasional errors—consistent with ε > 0.

Evidence from Psycholinguistics

Rapid Probabilistic Disambiguation

Humans use contextual probabilities to resolve ambiguity almost instantaneously . Even in garden-path sentences (e.g., “The horse raced past the barn fell”), misinterpretations occur but are rare relative to successful parsing.

This supports the claim that:

A single meaning θ₀ dominates in the human posterior belief,
But alternate meanings retain small probability mass .

Context-Dependence and Disambiguation

Work in contextual integration shows that humans update interpretations probabilistically as context accumulates, consistent with the multiplicative reduction in ambiguity predicted by ε-ambiguity theory, i.e. Proposition 1 in .

The ε-ambiguity model predicts (and proves) that when a listener receives multiple messages \((x_1, x_2, \dots, x_m )\) generated from the same θ:

\[\varepsilon_{\text{combined}} \approx \varepsilon(x_1)\varepsilon(x_2)\cdots\varepsilon(x_m).\]

This is aligned with psychological evidence that humans aggregate cues. This multiplicative decay of ε explains:

why conversation repair is effective,
why adding context sharpens meaning,

Thus ε plays the role in controlling inference quality.

Repair Mechanisms

Conversation analysis shows that misunderstandings occur at low but non-zero frequency, and repair strategies efficiently correct them—suggesting that ε(x) is generally small but salient.

Cooperative Principle

Grice’s cooperative principle ensures that interpretation leans toward meanings that maximize communicative coherence, forcing

\[\Pr(\theta_0 \mid x) \gg \Pr(\theta\_{\text{alt}} \mid x),\]

even when multiple interpretations are technically possible.

Evidence from Computational Linguistics

Corpus-Based Skew of Meaning Distributions

Probabilistic models such as topic models, PCFGs, and neural parsers show extreme sparsity in the joint distribution of meanings and linguistic forms . For example:

One parse typically has probability ≫ 0.9,
Alternate parses share the remaining mass.

This sparsity is exactly the structure assumed for ε-ambiguous languages.

Behavior of Large Language Models

LLMs themselves reveal ε-like behavior:

Sensitivity to Prompt Ambiguity

When prompts are under-specified, LLM outputs diverge, demonstrating non-zero ε(x). When prompts are clarified or expanded (e.g., chain-of-thought prompting), the model’s output variance collapses—interpretable as effective ε(x) decreasing multiplicatively with additional linguistic evidence .

Convergence Under Contextual Redundancy

The theoretical models show that concatenating independent messages reduces ambiguity roughly like:

\[\varepsilon_{\text{combined}} \approx \prod_i \varepsilon(x_i).\]

This aligns with empirical improvements when LLMs receive:

multiple examples in in-context learning ,
multiple paraphrases of a query ,
deliberate step-by-step reasoning .

Improved Performance with Additional Cues

Work on instruction tuning shows that LLMs improve dramatically when intentions are expressed more explicitly; implicit or ambiguous instructions (large ε) yield errors.

Theoretical Justification from Information Theory

The empirical observations in previous sections strongly suggest that natural languages align with the ε-ambiguity formalism. In this section, we present a more formal theoretical justification supporting the necessity of ε-ambiguity for any human language capable of large-scale communication, inference, and compositional generalization.

Communication Under Uncertainty Requires Controlled Ambiguity

Let θ denote a latent intention and x a surface linguistic signal. Human communication is characterized by:

Inherent variability in production
Speakers do not produce perfectly deterministic signals for intentions.
\[H(x \mid \theta) > 0.\]
Redundancy and recoverability in comprehension
Listeners consistently recover the intended meaning despite variability:
\[\Pr(\theta_0 \mid x) \text{ is typically high}.\]

This requires that the conditional distribution over intentions, q(θ ∣ x), be sharply peaked but not a delta distribution. Formally, this implies:

\[\Pr(\theta_0 \mid x) = 1 - \varepsilon(x),\]

where ε(x) captures the intrinsic noise or ambiguity.

If ε(x) were zero:

Language would be fully deterministic, like a programming language.
Yet natural languages support ellipsis, metaphor, deixis, and underspecification—contradicting determinism.

If ε(x) were large:

Communication would be unreliable and human languages could not function as coordination tools .

Thus, human language must live in the intermediate regime:

\[0 < \varepsilon(x) \ll 1.\]

This is precisely the definition of an ε-ambiguity language.

Information-Theoretic Justification

Given Shannon’s channel coding theorem, any efficient communication system must satisfy:

\[H(\theta \mid x) > 0,\]

unless compressed messages carry arbitrarily large complexity.

Natural languages are highly compressed representations of latent intentions. For compression to be efficient:

\[H(\theta \mid x) = \mathbb{E}[\varepsilon(x)] > 0\]

must hold. But for communication to function at all:

\[H(\theta \mid x) \ll H(\theta)\]

must also hold.

This yields:

\[0 < \varepsilon(x) \ll 1.\]

Thus ε is not merely empirical—it is forced by fundamental information-theoretic constraints on communication between bounded agents.

Ambiguity as a Structural Requirement for Expressivity

A classic result from information theory states that, under bounded channel capacity, a communication code must balance:

expressivity (large hypothesis space Θ),
efficiency (short messages),
robustness (recoverability under noise).

Natural languages accomplish this by encoding intentions in probabilistic distributions over many correlated cues (syntax, semantics, prosody, discourse), with none individually deterministic. This “multi-cue redundancy” structure implies that:

\[q(x \mid \theta) \text{ is broad, but structured},\]

leading again to:

\[\Pr(\theta_0 \mid x) \approx 1 - \varepsilon(x)\]

for some small ε(x).

The ε term mathematically captures the trade-off between:

variability (which increases expressive richness), and
stability (needed for mutual intelligibility).

Thus ε-ambiguity is not an accident but a structural necessity for language to be both expressive and learnable.

Ambiguity Is Necessary for Compositionality

A fully deterministic mapping from θ → x (ε = 0) would break compositionality in languages such as English:

Metaphor, ellipsis, and deixis require listeners to infer missing structure.
Rich morphology and syntax permit underspecified constructions that rely on inference rather than explicit specification.

If ε were zero, all linguistic constructions would require exhaustive specification of intentions, leading to:

infinitely long messages,
no pragmatic inference,
no flexible reference resolution.

Conversely, if ε is small but non-zero, compositional structures can afford underspecification, because the listener’s inferential machinery resolves them with high probability.

Thus ε > 0 is a prerequisite for efficient and human-like generative grammar.

Bayesian Models of Language Support ε-Ambiguity

Probabilistic pragmatics models the listener as:

\[\Pr(\theta \mid x) \propto \Pr(x \mid \theta)\Pr(\theta).\]

Empirically, these models consistently find:

a dominant intention θ₀ that captures most posterior mass
a long tail of alternative intentions with total mass ε(x)

Thus:

\[\Pr(\theta_0 \mid x) = 1 - \varepsilon(x)\]

emerges naturally as a mathematical property of Bayesian interpretation under realistic priors and likelihoods.

This shows that ε-ambiguity is a mathematically inevitable property of any communicative system interpreted via Bayesian reasoning, which includes both humans and LLMs.

Conclusion

Across linguistic, psycholinguistic, computational, and conversational evidence, natural languages display the following features:

Property	Observed in Natural Languages	Matches ε-Ambiguous Definition
Multiple interpretations possible	✔	ε(x) > 0
One interpretation strongly dominant	✔	Pr(θ₀ \| x ) ≈ 1-ε(x)
Ambiguity decreases with context	✔	ε(x₁x₂) ≈ ε(x₁)ε(x₂)
Communication is highly reliable	✔	ε(x) generally small
Occasional misinterpretation occurs	✔	ε(x) nonzero

Natural languages therefore satisfy the fundamental requirements of ε-ambiguity languages: they encode meaning with probabilistic stability but not absolute determinacy.

The ε-ambiguity model elegantly captures the essential structure of natural language semantics. Evidence from multiple disciplines demonstrates that natural languages exhibit:

constrained ambiguity,
dominant interpretations,
context-driven reduction of uncertainty,
non-zero but bounded error rates.

At last, the following empirical facts also align closely with the ε-ambiguity framework and serve as the further evidence:

Human communication requires nonzero uncertainty, but that uncertainty must be bounded ⟹ ε(x) > 0 but small.
Efficient, expressive languages require underspecification, which in turn requires ε(x) ≠ 0.
Bayesian inference over meanings implies skewed but non-delta posterior distributions ⟹ the ε form.
Multiple evidence aggregation reduces uncertainty multiplicatively, as predicted by the ε-ambiguity formalism.
Shannon’s information theory forces 0<ε≪1 in any efficient coding system.
Psycholinguistic data shows humans behave exactly like ε-ambiguity decoders.

Together, these points provide a rigorous foundation for treating natural languages as ε-ambiguity languages.