GCV criterion

Definition

The Generalized Cross Validation (GCV) criterion is a very similar method to the PRESS method. GCV is a rotation-invariant form of the PRESS method. The deriviation of the GCV from PRESS is shown in Golub, et al. 1979 [1].

GCV leads to choosing \(\lambda\) as the minimizer of the GCV function \(\mathcal{G}(\lambda)\), defined by

\[ \mathcal{G}(\lambda) \equiv \frac{ \| (\mathbf{I} - \mathbf{A}_\lambda)\mathbf{B}\mathbf{b} \|^2 } {\mathrm{tr}(\mathbf{I} - \mathbf{A}_\lambda)^2}, \]

where \(\mathbf{A}_\lambda \equiv \mathbf{B}\mathbf{T}(\mathbf{T}^\mathsf{T}\mathbf{Q}\mathbf{T} + \lambda\mathbf{H})^{-1}\mathbf{T}^\mathsf{T}\mathbf{B}^\mathsf{T}\), \(\mathrm{tr}(\cdot)\) is the trace of a matrix, and \(\mathbf{Q}=\mathbf{B}^\mathsf{T}\mathbf{B}\).

Using series-expansion form of the solution, \(\mathcal{G}(\lambda)\) can be written as

(1)\[ \mathcal{G}(\lambda) = \frac{\rho} {\left(r - \sum_{i=1}^r f_{\lambda,i} \right)^2}. \]

Deriviation of (1)

Recalling the Generalized Tikhonov regularized solution form and the series expansion, we obtain the following:

\[\begin{split} \begin{align*} \mathbf{A}_\lambda \mathbf{B}\mathbf{b} &= \mathbf{B}\mathbf{T} \mathbf{x}_\lambda\\ &= \mathbf{B}\mathbf{T}\tilde{\mathbf{V}}\mathbf{F}_\lambda\mathbf{S}^{-1}\mathbf{U}^\mathsf{T}\hat{\mathbf{b}}\\ &= \mathbf{U}\mathbf{S}\mathbf{F}_\lambda\mathbf{S}^{-1}\mathbf{U}^\mathsf{T}\mathbf{B}\mathbf{b}\quad(\because \mathbf{B}\mathbf{T}\tilde{\mathbf{V}} = \mathbf{U}\mathbf{S})\\ &= \mathbf{U}\mathbf{F}_\lambda\mathbf{U}^\mathsf{T}\mathbf{B}\mathbf{b}.\\ \therefore \mathbf{A}_\lambda &= \mathbf{U}\mathbf{F}_\lambda\mathbf{U}^\mathsf{T}. \end{align*} \end{split}\]

Then we have

\[\begin{split} \begin{align*} \text{numerator of } \mathcal{G}(\lambda) &= \| (\mathbf{I} - \mathbf{A}_\lambda)\mathbf{B}\mathbf{b} \|^2 \\ &= \| \mathbf{B}\mathbf{b} - \mathbf{B}\mathbf{T}\mathbf{x}_\lambda \|^2\\ &= \| \mathbf{b} - \mathbf{T}\mathbf{x}_\lambda\|_\mathbf{Q}^2 = \rho.\\\\ \text{denominator of } \mathcal{G}(\lambda) &= \mathrm{tr}(\mathbf{I} - \mathbf{A}_\lambda)^2\\ &= \left( \mathrm{tr}(\mathbf{I}) - \mathrm{tr}(\mathbf{A}_\lambda) \right)^2\\ &= \left( \mathrm{tr}(\mathbf{I}) - \mathrm{tr}(\mathbf{U}\mathbf{F}_\lambda\mathbf{U}^\mathsf{T}) \right)^2\\ &= \left( \mathrm{tr}(\mathbf{I}) - \mathrm{tr}(\mathbf{F}_\lambda) \right)^2 \quad\left( \because \mathrm{tr}(\mathbf{U}\mathbf{F}_\lambda\mathbf{U}^\mathsf{T}) = \mathrm{tr}(\mathbf{U}^\mathsf{T}\mathbf{U}\mathbf{F}_\lambda) = \mathrm{tr}(\mathbf{F}_\lambda) \right)\\ &= \left( \sum_{i=1}^r 1 - \sum_{i=1}^r f_{\lambda,i} \right)^2\\ &= \left( r - \sum_{i=1}^r f_{\lambda,i} \right)^2. \end{align*} \end{split}\]

Example

The example is shown in a notebook.

Limitation

GCV is a good method when the noise is unknown and the noise is assumed to be white noise, however, it often fails to give a satisfactory result when the error is highly correlated.
See the example case for the limitation of GCV.