This page was generated from docs/notebooks/non-iterative/gcv.ipynb.
Download notebook.

GCV criterion

Here we show how to use the Generalized Cross Validation (GCV) criterion to select the optimal regularization parameter for an example ill-posed inverse problem.

The definition of the GCV criterion is introduced in this page.

Definition of the example problem

We refer to the famous Fredholm integral equation of the first kind devised by Phillips[1]. Define the function

\[\begin{split}\begin{equation} f(x) \equiv \begin{cases} 1 + \displaystyle\cos\left(\frac{\pi x}{3}\right), & \text{if } | x | < 3, \\ 0, & \text{otherwise} \end{cases}. \end{equation}\end{split}\]

Then, the kernel \(K(s, t)\) and the solution \(x(t)\) are given by

\[\begin{split}\begin{align} K(s, t) &\equiv f(s - t)\\ x(t) &\equiv f(t) \end{align}\end{split}\]

Both integral intervals are \([-6, 6]\).

[1]:
import numpy as np
import ultraplot as uplt
from rich import print as rprint
from scipy.sparse import diags_array

from cherab.inversion import GCV, SVD
from cherab.inversion.tools import parse_scientific_notation

Let us define functions for the kernel, the solution and the right-hand side.

[2]:
def _func(x: float) -> float:
    if abs(x) < 3.0:
        return 1.0 + np.cos(np.pi * x / 3.0)
    else:
        return 0.0


def kernel(t: float, s: float) -> float:
    """The kernel function."""
    return _func(t - s)


def solution(t: float) -> float:
    """The solution function."""
    return _func(t)

Then we will generate the discretized version of the kernel \(\mathbf{K}\), solution \(\mathbf{x}\) and right-hand side \(\mathbf{b}\) using the trapezoidal rule leading to the following linear system:

\[\begin{equation} \mathbf{K}\mathbf{x} = \mathbf{b}, \end{equation}\]

The size of matrix \(\mathbf{K}\in\mathbb{R}^{M\times N}\) is set to \(M = N = 64\).

[3]:
s = np.linspace(-6.0, 6.0, num=64, endpoint=True)
t = np.linspace(-6.0, 6.0, num=64, endpoint=True)

# discretize kernel
k_mat = np.zeros((s.size, t.size))
k_mat = np.array([[kernel(i, j) for j in t] for i in s])

# trapezoidal rule
k_mat[:, 0] *= 0.5
k_mat[:, -1] *= 0.5
k_mat *= t[1] - t[0]

# discretize solution
x_t = np.array([solution(i) for i in t])

print(f"{k_mat.shape = }")
print(f"{x_t.shape = }")
print(f"condition number of K is {np.linalg.cond(k_mat):.4g}")
k_mat.shape = (64, 64)
x_t.shape = (64,)
condition number of K is 5.924e+04

The right-hand side \(\mathbf{b}\in\mathbb{R}^{M}\) is usually contaminated by noise. So, we will add a Gaussian noise to the right-hand side.

\[\mathbf{b} = \bar{\mathbf{b}} + \mathbf{e},\]

where \(\bar{\mathbf{b}}\) is the original right-hand side (\(\bar{\mathbf{b}}\equiv \mathbf{K}\mathbf{x}\)), \(\mathbf{e}\) is a vector whose elements are independently sampled from the normal distribution with mean 0 and standard deviation \(10^{-3}\).

[4]:
b_bar = k_mat @ x_t
rng = np.random.default_rng()
noise = rng.normal(0, 1.0e-3, b_bar.size)
b = b_bar + noise

Solve the inverse problem using the GCV criterion

The solution of the ill-posed linear equation is obtained with the regularization procedure:

\[\begin{equation} \mathbf{x}_\lambda = \left(\mathbf{K}^\mathsf{T}\mathbf{K} + \lambda\mathbf{H}\right)^{-1}\mathbf{K}^\mathsf{T}\mathbf{b}, \end{equation}\]

where \(\mathbf{H}\) is the regularization matrix. Here we set \(\mathbf{H} = \mathbf{D_2}^\mathsf{T}\mathbf{D_2}\), where \(\mathbf{D_2}\) is the second-order difference matrix.

[5]:
dmat = diags_array([1, -2, 1], offsets=[-1, 0, 1], shape=(t.size, t.size), dtype=float).tocsr()
print(f"{dmat.shape = }")
dmat.shape = (64, 64)

Then we create SVD solver object.

[6]:
svd = SVD(k_mat, H=dmat.T @ dmat, data=b)

Let us solve the inverse problem with the GCV criterion to determine the optimal regularization parameter.

[7]:
gcv = GCV()
sol, status = svd.solve(gcv)
rprint(status)
                    message: ['requested number of basinhopping iterations completed successfully']
                    success: True
                        fun: 2.0363564160131267e-08
                          x: [-1.735e+00]
                        nit: 100
      minimization_failures: 0
                       nfev: 556
                       njev: 278
 lowest_optimization_result:  message: CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL
                              success: True
                               status: 0
                                  fun: 2.0363564160131267e-08
                                    x: [-1.735e+00]
                                  nit: 0
                                  jac: [ 1.119e-09]
                                 nfev: 2
                                 njev: 1
                             hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>

Evaluate the GCV criterion

Next we evaluate the solution obtained by the GCV criterion

Plot GCV curve

[8]:
fig, ax = uplt.subplots()
gcv.plot_gcv(axes=ax)
ax.format(
    xformatter="log",
    yformatter="log",
)
../../_images/notebooks_non-iterative_gcv_19_0.png

Compare \(\mathbf{x}_\lambda\) with \(\mathbf{x}_\mathrm{true}\)

Let us compare solutions at different regularization parameters \(\lambda=10^{-9}\), \(\lambda_\text{opt}\), \(10^5\) with the true solution \(\mathbf{x}_\mathrm{true}\).

[9]:
lambdas = [1.0e-9, svd.lambda_opt, 1.0e5]

fig, axes = uplt.subplots(ncols=3)

for ax, beta in zip(axes, lambdas, strict=False):
    ax.plot(t, x_t, "--", label="$\\mathbf{x}_\\mathrm{true}$")
    ax.plot(t, svd.solution(beta=beta), label="$\\mathbf{x}_\\lambda$")
    ax.axhline(0, color="black", lw=0.75, ls="--", zorder=-1)

    ax.legend(loc="upper left", ncols=1)

    parsed_lambda = parse_scientific_notation(f"{beta:.2e}")
    ax.format(
        title=f"$\\lambda = {parsed_lambda}$",
    )

axes.format(
    ylim=(-0.5, x_t.max() * 1.1),
    xlabel="$t$",
)
../../_images/notebooks_non-iterative_gcv_21_0.png

Check the relative error

The relative error between the solution \(\mathbf{x}_\lambda\) and the true solution \(\mathbf{x}_\mathrm{true}\) is defined as follows:

\[\begin{equation} \epsilon_\mathrm{rel} = \frac{\|\mathbf{x}_\lambda - \mathbf{x}_\mathrm{true}\|_2}{\|\mathbf{x}_\mathrm{true}\|_2}. \end{equation}\]

Let us seek the minimum \(\epsilon_\mathrm{rel}\) as a function of \(\lambda\).

[10]:
from scipy.optimize import minimize_scalar

X_T_NORM = np.linalg.norm(x_t, axis=0)


def relative_error(
    log_lambda: float,
    x_t: np.ndarray = x_t,
    x_t_norm: float = X_T_NORM,
    svd: SVD = svd,
) -> float:
    """Calculate relative error."""
    beta = 10**log_lambda
    sol = svd.solution(beta=beta)
    return np.linalg.norm(x_t - sol, axis=0) / x_t_norm


# minimize relative error
bounds = svd.bounds
res = minimize_scalar(
    relative_error,
    bounds=bounds,
    method="bounded",
    args=(x_t, X_T_NORM, svd),
    options={"xatol": 1.0e-10, "maxiter": 1000},
)

# obtain minimum relative error and lambda
error_min = res.fun
lambda_min = 10**res.x

print(f"minimum relative error: {error_min:.2%} at lambda = {lambda_min:.4g}")
minimum relative error: 0.56% at lambda = 0.004394

Let us plot the relative error and curvature as a function of \(\lambda\).

[11]:
# set regularization parameters
num = 500
lambdas = np.logspace(*bounds, num=num)

# calculate errors and gcv
errors = np.asarray([relative_error(log_lambda) for log_lambda in np.linspace(*bounds, num=num)])
gcvs = np.asarray([gcv.gcv(svd, beta) for beta in lambdas])

# create figure
fig, ax1 = uplt.subplots()
ax2 = ax1.alty(color="C1")

# plot errors and gcv
ax1.loglog(lambdas, errors, color="C0")
ax2.loglog(lambdas, gcvs, color="C1")

# plot minimum error vertical line and point
ax1.axvline(lambda_min, color="r", linestyle="--", linewidth=0.75)
ax1.scatter(lambda_min, error_min, color="r", marker="o", s=10, zorder=2)
ax1.text(
    lambda_min,
    error_min,
    "$\\lambda_\\mathrm{min}$",
    color="r",
    horizontalalignment="left",
    verticalalignment="top",
    border=True,
)

# plot minimum gcv vertical line and point
assert svd.lambda_opt is not None
min_gcv = gcv.gcv(svd, svd.lambda_opt)
ax1.axvline(svd.lambda_opt, color="g", linestyle="--", linewidth=0.75)
ax2.scatter(svd.lambda_opt, min_gcv, color="g", marker="o", s=10, zorder=2)
ax2.text(
    svd.lambda_opt,
    min_gcv,
    "$\\lambda_\\mathrm{opt}$",
    color="g",
    horizontalalignment="right",
    verticalalignment="bottom",
    border=True,
)

ax1.autoscale(enable=True, axis="x", tight=True)
ax2.autoscale(enable=True, axis="x", tight=True)

# format axes
ax1.format(
    ycolor="C0",
    ylim=(1.0e-3, 1),
    xlabel="Regularization parameter, $\\lambda$",
    ylabel="Relative error, $\\epsilon_\\mathrm{rel}$",
    xformatter="log",
    yformatter="log",
)
ax2.format(
    ylabel="GCV",
    yformatter="log",
)

error_opt = relative_error(np.log10(svd.lambda_opt))
ax1.format(
    title=f"$\\epsilon_\\mathrm{{rel}}(\\lambda_\\mathrm{{opt}})$ = {error_opt:.2%}, "
    + f"$\\epsilon_\\mathrm{{rel}}(\\lambda_\\mathrm{{min}}) = ${error_min:.2%}"
)
../../_images/notebooks_non-iterative_gcv_25_0.png

References