Natural Languages as ε-Ambiguity Languages

In this post, we examines the claim that natural languages are ε-ambiguity languages in the sense defined by the probabilistic theories of language and latent-intention inference in (Jiang, 2023). This article surveys linguistic, psycholinguistic, and computational evidence demonstrating that natural languages exhibit precisely this structure.

The ε-Ambiguity Framework

Natural languages support reliable communication despite variability, noise, and structural underspecification. Unlike programming languages, they allow metaphor, ellipsis, ambiguity, deixis, and context-dependent meaning. Yet humans typically recover the intended meaning with high accuracy.

Recent theoretical work introduces ε-ambiguity languages as a formal tool to model this phenomenon. Under this framework, a language is ε-ambiguous if, for any meaningful message x, there exists a dominant intended meaning θ₀ such that

\[\Pr(\theta_0 \mid x) \ge 1 - \varepsilon(x), \quad \varepsilon(x) \in [0,1),\]

but alternative interpretations occur with small but non-zero probability.

In this framework, although many meanings are technically compatible with a linguistic expression, one meaning dominates the posterior probability, and ambiguity occurs only with small probability ε(x). In , it argues that natural languages empirically exhibit ε-ambiguity, namely they are neither perfectly unambiguous nor fully ambiguous, but instead allow for reliably dominant meanings with bounded ambiguity. Furthermore, argues that ε-ambiguity provides a coherent explanation for both human semantics and LLM emergent abilities.

Model Definition

As in , we assume:

A language is an ε-ambiguity language if for every meaningful expression x:

\[\Pr(\theta_0 \mid x) \ge 1 - \varepsilon(x),\]

where θ₀ is the intended meaning, and ε(x) quantifies residual ambiguity.

This describes a sparse posterior over meanings: a dominant intention and a long but very small tail of alternatives.

Consequences

This model predicts that:

  1. Communication is reliable but not deterministic.
  2. Ambiguity decreases multiplicatively when multiple cues or messages are provided.
  3. Latent-intention inference is feasible even without explicit symbolic structure.

The fundamental principle behind ε-ambiguity languages is that linguistic expressions exhibit partial but not perfect semantic determinacy. Messages tend to convey one meaning with high probability, but never with absolute certainty. Real-world natural languages possess exactly these properties: they support efficient communication despite intrinsic ambiguity, and ambiguity is controlled by contextual, semantic, and pragmatic mechanisms.

The ε-ambiguity framework formalizes this intuition within a probabilistic generative model of communication, where meanings (latent intentions θ) are drawn from a space Θ, and surface messages are generated by noisy processes with intention-specific distributions q(x ∣ θ). The framework provides a mathematical explanation for why LLMs can infer hidden meanings from text and why phenomena such as chain-of-thought reasoning reduce uncertainty.

The central question of this article is:

What empirical and theoretical evidence supports the view that natural languages satisfy the definition of ε-ambiguity languages?

We demonstrate that evidence comes from multiple research domains.


Evidence from Linguistics

Lexical Ambiguity and Polysemy

Natural languages contain extensive lexical ambiguity . Words often possess multiple senses (e.g., bank, seal, interest), yet human speakers reliably infer the dominant meaning from context. Corpus-based studies of word sense disambiguation have shown that sense distributions are often highly skewed, with the most frequent sense accounting for a large majority of occurrences .

This illustrates exactly the condition:

\[\Pr(\theta_0 \mid x) \approx 1 - \varepsilon(x), \quad \varepsilon(x) \text{ small but nonzero},\]

where θ₀ is the dominant sense.

Syntactic Ambiguity

Classic syntactic ambiguities (e.g., “I saw the man with the telescope”) allow multiple parses, yet listeners overwhelmingly adopt one interpretation when context is provided. Probabilistic grammars assign steeply skewed probability distributions to parses , again demonstrating nonzero but concentrated posterior distributions over intentions.

Pragmatic Inference and Speech Acts

Pragmatics often shifts literal meanings to intended meanings . For instance:

Research on speech-act recognition shows that listeners infer intended acts with high reliability but occasional errors—consistent with ε > 0.


Evidence from Psycholinguistics

Rapid Probabilistic Disambiguation

Humans use contextual probabilities to resolve ambiguity almost instantaneously . Even in garden-path sentences (e.g., “The horse raced past the barn fell”), misinterpretations occur but are rare relative to successful parsing.

This supports the claim that:

Context-Dependence and Disambiguation

Work in contextual integration shows that humans update interpretations probabilistically as context accumulates, consistent with the multiplicative reduction in ambiguity predicted by ε-ambiguity theory, i.e. Proposition 1 in .

The ε-ambiguity model predicts (and proves) that when a listener receives multiple messages \((x_1, x_2, \dots, x_m )\) generated from the same θ:

\[\varepsilon_{\text{combined}} \approx \varepsilon(x_1)\varepsilon(x_2)\cdots\varepsilon(x_m).\]

This is aligned with psychological evidence that humans aggregate cues. This multiplicative decay of ε explains:

Thus ε plays the role in controlling inference quality.

Repair Mechanisms

Conversation analysis shows that misunderstandings occur at low but non-zero frequency, and repair strategies efficiently correct them—suggesting that ε(x) is generally small but salient.

Cooperative Principle

Grice’s cooperative principle ensures that interpretation leans toward meanings that maximize communicative coherence, forcing

\[\Pr(\theta_0 \mid x) \gg \Pr(\theta\_{\text{alt}} \mid x),\]

even when multiple interpretations are technically possible.


Evidence from Computational Linguistics

Corpus-Based Skew of Meaning Distributions

Probabilistic models such as topic models, PCFGs, and neural parsers show extreme sparsity in the joint distribution of meanings and linguistic forms . For example:

This sparsity is exactly the structure assumed for ε-ambiguous languages.

Behavior of Large Language Models

LLMs themselves reveal ε-like behavior:

Sensitivity to Prompt Ambiguity

When prompts are under-specified, LLM outputs diverge, demonstrating non-zero ε(x). When prompts are clarified or expanded (e.g., chain-of-thought prompting), the model’s output variance collapses—interpretable as effective ε(x) decreasing multiplicatively with additional linguistic evidence .

Convergence Under Contextual Redundancy

The theoretical models show that concatenating independent messages reduces ambiguity roughly like:

\[\varepsilon_{\text{combined}} \approx \prod_i \varepsilon(x_i).\]

This aligns with empirical improvements when LLMs receive:

Improved Performance with Additional Cues

Work on instruction tuning shows that LLMs improve dramatically when intentions are expressed more explicitly; implicit or ambiguous instructions (large ε) yield errors.


Theoretical Justification from Information Theory

The empirical observations in previous sections strongly suggest that natural languages align with the ε-ambiguity formalism. In this section, we present a more formal theoretical justification supporting the necessity of ε-ambiguity for any human language capable of large-scale communication, inference, and compositional generalization.

Communication Under Uncertainty Requires Controlled Ambiguity

Let θ denote a latent intention and x a surface linguistic signal. Human communication is characterized by:

  1. Inherent variability in production
    Speakers do not produce perfectly deterministic signals for intentions.

    \[H(x \mid \theta) > 0.\]
  2. Redundancy and recoverability in comprehension
    Listeners consistently recover the intended meaning despite variability:

    \[\Pr(\theta_0 \mid x) \text{ is typically high}.\]

This requires that the conditional distribution over intentions, q(θ ∣ x), be sharply peaked but not a delta distribution. Formally, this implies:

\[\Pr(\theta_0 \mid x) = 1 - \varepsilon(x),\]

where ε(x) captures the intrinsic noise or ambiguity.

If ε(x) were zero:

If ε(x) were large:

Thus, human language must live in the intermediate regime:

\[0 < \varepsilon(x) \ll 1.\]

This is precisely the definition of an ε-ambiguity language.

Information-Theoretic Justification

Given Shannon’s channel coding theorem, any efficient communication system must satisfy:

\[H(\theta \mid x) > 0,\]

unless compressed messages carry arbitrarily large complexity.

Natural languages are highly compressed representations of latent intentions. For compression to be efficient:

\[H(\theta \mid x) = \mathbb{E}[\varepsilon(x)] > 0\]

must hold. But for communication to function at all:

\[H(\theta \mid x) \ll H(\theta)\]

must also hold.

This yields:

\[0 < \varepsilon(x) \ll 1.\]

Thus ε is not merely empirical—it is forced by fundamental information-theoretic constraints on communication between bounded agents.

Ambiguity as a Structural Requirement for Expressivity

A classic result from information theory states that, under bounded channel capacity, a communication code must balance:

Natural languages accomplish this by encoding intentions in probabilistic distributions over many correlated cues (syntax, semantics, prosody, discourse), with none individually deterministic. This “multi-cue redundancy” structure implies that:

\[q(x \mid \theta) \text{ is broad, but structured},\]

leading again to:

\[\Pr(\theta_0 \mid x) \approx 1 - \varepsilon(x)\]

for some small ε(x).

The ε term mathematically captures the trade-off between:

Thus ε-ambiguity is not an accident but a structural necessity for language to be both expressive and learnable.

Ambiguity Is Necessary for Compositionality

A fully deterministic mapping from θ → x (ε = 0) would break compositionality in languages such as English:

If ε were zero, all linguistic constructions would require exhaustive specification of intentions, leading to:

Conversely, if ε is small but non-zero, compositional structures can afford underspecification, because the listener’s inferential machinery resolves them with high probability.

Thus ε > 0 is a prerequisite for efficient and human-like generative grammar.


Bayesian Models of Language Support ε-Ambiguity

Probabilistic pragmatics models the listener as:

\[\Pr(\theta \mid x) \propto \Pr(x \mid \theta)\Pr(\theta).\]

Empirically, these models consistently find:

Thus:

\[\Pr(\theta_0 \mid x) = 1 - \varepsilon(x)\]

emerges naturally as a mathematical property of Bayesian interpretation under realistic priors and likelihoods.

This shows that ε-ambiguity is a mathematically inevitable property of any communicative system interpreted via Bayesian reasoning, which includes both humans and LLMs.

Conclusion

Across linguistic, psycholinguistic, computational, and conversational evidence, natural languages display the following features:

Property Observed in Natural Languages Matches ε-Ambiguous Definition
Multiple interpretations possible ε(x) > 0
One interpretation strongly dominant Pr(θ₀ | x ) ≈ 1-ε(x)
Ambiguity decreases with context ε(x₁x₂) ≈ ε(x₁)ε(x₂)
Communication is highly reliable ε(x) generally small
Occasional misinterpretation occurs ε(x) nonzero

Natural languages therefore satisfy the fundamental requirements of ε-ambiguity languages: they encode meaning with probabilistic stability but not absolute determinacy.

The ε-ambiguity model elegantly captures the essential structure of natural language semantics. Evidence from multiple disciplines demonstrates that natural languages exhibit:

At last, the following empirical facts also align closely with the ε-ambiguity framework and serve as the further evidence:

Together, these points provide a rigorous foundation for treating natural languages as ε-ambiguity languages.