6.2 The Probability Density Function

Bayes' Theorem states that:

$\displaystyle \mathrm{P}\left( \mathbf{u} \vert \left\{ \mathbf{x}_i, f_i, \sig...
...ft( \left\{f_i \right\} \vert \left\{ \mathbf{x}_i, \sigma_i \right\} \right) }$ (6.1)

Since we are only seeking to maximise the quantity on the left, and the denominator, termed the Bayesian evidence, is independent of $ \mathbf{u}$, we can neglect it and replace the equality sign with a proportionality sign. Furthermore, if we assume a uniform prior, that is, we assume that we have no prior knowledge to bias us towards certain more favoured values of $ \mathbf{u}$, then $ \mathrm{P}\left( \mathbf{u} \right)$ is also a constant which can be neglected. We conclude that maximising $ \mathrm{P}\left( \mathbf{u} \vert \left\{ \mathbf{x}_i, f_i, \sigma_i \right\} \right)$ is equivalent to maximising $ \mathrm{P}\left( \left\{f_i \right\} \vert \mathbf{u},
\left\{ \mathbf{x}_i, \sigma_i \right\} \right)$.

Since we are assuming $ f_i$ to be Gaussian-distributed observations of the true function $ f()$, this latter probability can be written as a product of $ n_\mathrm{d}$ Gaussian distributions:

$\displaystyle \mathrm{P}\left( \left\{f_i \right\} \vert \mathbf{u}, \left\{ \m...
...rac{ -\left[f_i - f_\mathbf{u}(\mathbf{x}_i)\right]^2 }{ 2 \sigma_i^2 } \right)$ (6.2)

The product in this equation can be converted into a more computationally workable sum by taking the logarithm of both sides. Since logarithms are monotonically increasing functions, maximising a probability is equivalent to maximising its logarithm. We may write the logarithm $ L$ of $ \mathrm{P}\left( \mathbf{u} \vert \left\{ \mathbf{x}_i, f_i, \sigma_i \right\} \right)$ as:

$\displaystyle L = \sum_{i=0}^{n_\mathrm{d}-1} \left( \frac{ -\left[f_i - f_\mathbf{u}(\mathbf{x}_i)\right]^2 }{ 2 \sigma_i^2 } \right) + k$ (6.3)

where $ k$ is some constant which does not affect the maximisation process. It is this quantity, the familiar sum-of-square-residuals, that we numerically maximise to find our best-fitting set of parameters, which I shall refer to from here on as $ \mathbf{u}^0$.

Dominic Ford 2006-09-09