#######################
N-gram smoothing models
#######################
When dealing with `n-gram`_ models, smoothing refers to the practice of
adjusting empirical probability estimates to account for insufficient data.
In the descriptions below, we use the notation :math:`w^{j}_{i}`, :math:`i < j`, to
denote the `(j - i)`-gram :math:`(w_{i}, w_{i+1}, \ldots, w_{j})`.
.. raw:: html
### Laplace Smoothing

`Laplace smoothing`_ is the assumption that each `n`-gram in a corpus occurs
exactly one more time than it actually does.
.. math::
p(w_i \mid w^{i-1}_{i-n+1}) = \frac{1 + c(w^{i}_{i-n+1})}{|V| \sum_{w_i} c(w^{i}_{i-n+1})}
where :math:`c(a)` denotes the empirical count of the `n`-gram :math:`a` in the
corpus, and :math:`|V|` corresponds to the number of unique `n`-grams in the
corpus.
.. _`Laplace smoothing`: https://en.wikipedia.org/wiki/Additive_smoothing
**Models**
- :class:`~numpy_ml.ngram.AdditiveNGram`
.. raw:: html
### Additive/Lidstone Smoothing

`Additive/Lidstone smoothing`_ is a generalization of Laplace smoothing, where we
assume that each `n`-gram in a corpus occurs `k` more times than it actually
does (where `k` can be any non-negative value, but typically ranges between `[0, 1]`):
.. math::
p(w_i \mid w^{i-1}_{i-n+1}) = \frac{k + c(w^{i}_{i-n+1})}{k |V| \sum_{w_i} c(w^{i}_{i-n+1})}
where :math:`c(a)` denotes the empirical count of the `n`-gram :math:`a` in the
corpus, and :math:`|V|` corresponds to the number of unique `n`-grams in the
corpus.
.. _`Additive/Lidstone smoothing`: https://en.wikipedia.org/wiki/Additive_smoothing
**Models**
- :class:`~numpy_ml.ngram.AdditiveNGram`
.. raw:: html
### Good-Turing Smoothing

`Good-Turing smoothing`_ is a more sophisticated technique which takes into
account the identity of the particular `n`-gram when deciding the amount of
smoothing to apply. It proceeds by allocating a portion of the probability
space occupied by `n`-grams which occur with count `r+1` and dividing it among
the `n`-grams which occur with rate `r`.
.. math::
r^* = (r + 1) \frac{g(r + 1)}{g(r)} \\
p(w^{i}_{i-n+1} \mid c(w^{i}_{i-n+1}) = r) = \frac{r^*}{N}
where :math:`r^*` is the adjusted count for an `n`-gram which occurs `r` times,
`g(x)` is the number of `n`-grams in the corpus which occur `x` times, and `N`
is the total number of `n`-grams in the corpus.
.. _n-gram: https://en.wikipedia.org/wiki/N-gram
.. _`Good-Turing smoothing`: https://en.wikipedia.org/wiki/Good%E2%80%93Turing_frequency_estimation
**Models**
- :class:`~numpy_ml.ngram.GoodTuringNGram`
**References**
.. [1] Chen & Goodman (1998). "An empirical study of smoothing techniques
for language modeling". *Harvard Computer Science Group Technical Report
TR-10-98*.
.. [2] Gale & Sampson (1995). "Good-Turing frequency estimation without
tears". *Journal of Quantitative Linguistics*, 2(3), 217-237.
.. toctree::
:maxdepth: 3
:hidden:
numpy_ml.ngram.mle
numpy_ml.ngram.additive
numpy_ml.ngram.goodturing