This page was generated from docs/notebooks/non-iterative/L_curve.ipynb.
Download notebook.

L-curve criterion

This page shows how to use the L-curve criterion to find the regularization parameter in the case of the example ill-posed problem.

The basic theory of the L-curve criterion is described in this page.

Definition of example inverse problem

As a famous ill-posed linear equation, Fredholm integral equation is often used:

\[\begin{equation} \int_a^b K(s, t)\ x(t)\ \mathrm{d}t = b(s), \quad c\leq s\leq d. \label{eq:fredholm} \end{equation}\]

Here we think of the following situation as the above equation form:

\[\begin{split}\begin{align} & K(s, t) \equiv (\cos(s) + \cos(t))\left(\frac{\sin(u)}{u}\right)^2,\quad u \equiv \pi\left(\sin(s) + \sin(t) \right),\\ & [a, b] = [c, d] \equiv \left[-\frac{\pi}{2}, \frac{\pi}{2} \right]. \end{align}\end{split}\]

And, the true solution \(x_\text{true}(t)\) is assumed as follows:

\[\begin{equation} x_\text{true}(t) = 2.0 \exp\left[-6(t-0.8)^2 \right] + \exp\left[-2(t+0.5)^2 \right]. \end{equation}\]
[1]:
import numpy as np
import ultraplot as uplt
from rich import print as rprint
from scipy.sparse import diags_array

from cherab.inversion import SVD, Lcurve
from cherab.inversion.tools import parse_scientific_notation

Firstly, let us code the \(K(s, t)\) and \(x_\text{true}(t)\) as a function.

[2]:
def kernel(s: np.ndarray, t: np.ndarray) -> np.ndarray:
    """Kernel of Fredholm integral equation of the first kind."""
    u = np.pi * (np.sin(s) + np.sin(t))
    if u == 0:
        return np.cos(s) + np.cos(t)
    else:
        return (np.cos(s) + np.cos(t)) * (np.sin(u) / u) ** 2


def x_t_func(t: np.ndarray) -> np.ndarray:
    """Define the function x_true(t)"""
    return 2.0 * np.exp(-6.0 * (t - 0.8) ** 2) + np.exp(-2.0 * (t + 0.5) ** 2)

Discretization of the equation

When discretizing the integral equation \(\eqref{eq:fredholm}\) using the trapezoidal integral approximation, the following linear equation is obtained:

\[\begin{equation} \mathbf{K} \mathbf{x} = \mathbf{b}, \end{equation}\]

where \(\mathbf{K}\in\mathbb{R}^{M\times N}\) is the discretized kernel matrix, \(\mathbf{x}\in\mathbb{R}^N\) is the discretized solution vector and \(\mathbf{b}\in\mathbb{R}^M\) is the discretized data vector.

\(N\) and \(M\) are the number of discretization points of \(t\) and \(s\), respectively. Here we set \(N=M=64\) and generate points evenly spaced in \([-\pi/2, \pi/2]\). \(x_\mathrm{true}(t)\) discretized on these points yields the true solution vector \(\mathbf{x}_\mathrm{true}\).

[3]:
# discretize s, t
s = np.linspace(-np.pi * 0.5, np.pi * 0.5, num=64, endpoint=True)
t = np.linspace(-np.pi * 0.5, np.pi * 0.5, num=64, endpoint=True)

# vectorize solution
x_t = x_t_func(t)

# discretize kernel
k_mat = np.zeros((s.size, t.size))
k_mat = np.array([[kernel(i, j) for j in t] for i in s])

# trapezoidal rule
k_mat[:, 0] *= 0.5
k_mat[:, -1] *= 0.5
k_mat *= t[1] - t[0]

print(f"{k_mat.shape = }")
print(f"{x_t.shape = }")
print(f"condition number of K is {np.linalg.cond(k_mat):.4g}")
k_mat.shape = (64, 64)
x_t.shape = (64,)
condition number of K is 1.897e+18

The given data \(\mathbf{b}\) is generated by adding white noise \(\mathbf{e}\) to the true data \(\bar{\mathbf{b}} = \mathbf{K}\mathbf{x}_\mathrm{true}\), that is,

\[\begin{equation} \mathbf{b} = \bar{\mathbf{b}} + \mathbf{e}. \end{equation}\]

The noise variance is set to \(10^{-4}\).

[4]:
b_bar = k_mat @ x_t
rng = np.random.default_rng()
noise = rng.normal(0, 1.0e-4, b_bar.size)
b = b_bar + noise

Solve the inverse problem

The solution of the ill-posed linear equation is obtained with the regularization procedure:

\[\begin{equation} \mathbf{x}_\lambda = \left(\mathbf{K}^\mathsf{T}\mathbf{K} + \lambda\mathbf{H}\right)^{-1}\mathbf{K}^\mathsf{T}\mathbf{b}, \end{equation}\]

where \(\mathbf{H}\) is the regularization matrix. Here we set \(\mathbf{H} = \mathbf{D_2}^\mathsf{T}\mathbf{D_2}\), where \(\mathbf{D_2}\) is the second-order difference matrix.

[5]:
dmat = diags_array([1, -2, 1], offsets=[-1, 0, 1], shape=(t.size, t.size), dtype=float).tocsr()
print(f"{dmat.shape = }")
dmat.shape = (64, 64)

Then we create SVD solver object.

[6]:
svd = SVD(k_mat, H=dmat.T @ dmat, data=b)

Let us solve the inverse problem using the L-curve criterion to find the optimal regularization parameter.

[7]:
lcurve = Lcurve()
sol, status = svd.solve(lcurve)
rprint(status)
                    message: ['requested number of basinhopping iterations completed successfully']
                    success: True
                        fun: -79.45986150606987
                          x: [-4.646e+00]
                        nit: 100
      minimization_failures: 0
                       nfev: 2422
                       njev: 1211
 lowest_optimization_result:  message: CONVERGENCE: RELATIVE REDUCTION OF F <= FACTR*EPSMCH
                              success: True
                               status: 0
                                  fun: -79.45986150606987
                                    x: [-4.646e+00]
                                  nit: 4
                                  jac: [ 1.137e-04]
                                 nfev: 30
                                 njev: 15
                             hess_inv: <1x1 LbfgsInvHessProduct with dtype=float64>

Evaluate the L-curve criterion

Next we evaluate the solution obtained by the L-curve criterion

Plot L-curve

[8]:
fig, ax = uplt.subplots()
ax = lcurve.plot_L_curve(axes=ax, scatter_plot=7)
ax.autoscale(axis="both", tight=True)
ax.format(
    xformatter="log",
    yformatter="log",
)
../../_images/notebooks_non-iterative_L_curve_22_0.png

The L-curve shown above is limited to the range of \(\lambda\) from \(\sigma_0^2\) to \(\sigma_{r}^2\) and it is enough to find the corner of the L-curve in this range.

The below plot shows why it is enough by plotting points of \(\lambda = \sigma_i^2\) on the L-curve, where \(\sigma_i\) is the \(i\)-th singular value and \(i\) is indicated by the annotation.

[9]:
fig, ax = uplt.subplots()
ax = lcurve.plot_L_curve(axes=ax, plot_lambda_opt=False)
hline = ax.get_lines()[0]

indices = list(range(0, 20)) + [svd.s.size - 1]
sigmas = svd.s[indices]
residuals = [svd.residual_norm(beta) for beta in sigmas**2]
regularizations = [svd.regularization_norm(beta) for beta in sigmas**2]
hscatter = ax.scatter(residuals, regularizations, color="red", marker=".")
ax.legend([hline, hscatter], ["L-curve", "Points at $\\lambda = \\sigma_i^2$"], ncols=1)
for i, ind in enumerate(indices):
    ax.annotate(
        f"{ind}",
        (residuals[i], regularizations[i]),
    )

ax.format(
    xformatter="log",
    yformatter="log",
)
../../_images/notebooks_non-iterative_L_curve_24_0.png
[10]:
print(f"sigma_7^2 : {sigmas[7] ** 2:.4e}")
print(f"sigma_8^2 : {sigmas[8] ** 2:.4e}")
print(f"sigma_9^2 : {sigmas[9] ** 2:.4e}")
print(f"lambda_opt: {svd.lambda_opt:.4e}")
sigma_7^2 : 6.4611e-04
sigma_8^2 : 1.2423e-06
sigma_9^2 : 4.5937e-09
lambda_opt: 2.2618e-05

Plot L-curve’s curvature

\(\lambda_\mathrm{opt}\) is the regularization parameter that maximizes the curvature of the L-curve.

[11]:
fig, ax = uplt.subplots()
ax = lcurve.plot_curvature(axes=ax)
lambda_opt_text = parse_scientific_notation(f"{svd.lambda_opt:.3e}")
ax.format(
    title=f"$\\lambda_\\mathrm{{opt}} = {lambda_opt_text}$",
    xformatter="log",
)
../../_images/notebooks_non-iterative_L_curve_27_0.png

Compare \(\mathbf{x}_\lambda\) with \(\mathbf{x}_\mathrm{true}\)

Let us compare solutions at different regularization parameters \(\lambda=10^{-9}\), \(\lambda_\text{opt}\), \(10^3\) with the true solution \(\mathbf{x}_\mathrm{true}\).

[12]:
lambdas = [1.0e-9, svd.lambda_opt, 1.0e3]

fig, axes = uplt.subplots(ncols=3)

for ax, beta in zip(axes, lambdas, strict=False):
    ax.plot(t, x_t, "--", label="$\\mathbf{x}_\\mathrm{true}$")
    ax.plot(t, svd.solution(beta=beta), label="$\\mathbf{x}_\\lambda$")

    ax.legend(loc="upper left", ncols=1)
    parsed_lambda = parse_scientific_notation(f"{beta:.2e}")
    ax.format(title=f"$\\lambda = {parsed_lambda}$")

axes.format(
    ylim=(0, x_t.max() * 1.1),
    xlabel="$t$",
)
../../_images/notebooks_non-iterative_L_curve_29_0.png

We can see that the solution at \(\lambda < \lambda_\mathrm{opt}\) is perturbed by noise, while the solution at \(\lambda > \lambda_\mathrm{opt}\) is smoothed too much.

Plot norms and curvature as a function of \(\lambda\)

Let us plot the residual norm \(\sqrt{\rho}\) and the regularization norm \(\sqrt{\eta}\) as a function of \(\lambda\). Additionally, we plot the curvature of the L-curve as a function of \(\lambda\).

[13]:
fig, ax1 = uplt.subplots()

ax2 = ax1.alty(color="C1")
ax3 = ax1.twinx(
    yloc=("axes", 1.3),
    ycolor="C2",
    label="Curvature of L-curve",
)

# calculation of the values
lambdas = np.logspace(-10, 0, num=500)
rhos = [svd.residual_norm(beta) for beta in lambdas]
etas = [svd.regularization_norm(beta) for beta in lambdas]
kappa = [lcurve.curvature(svd, beta) for beta in lambdas]

# plot lines
ax1.loglog(lambdas, rhos, color="C0")
ax2.loglog(lambdas, etas, color="C1")
ax3.semilogx(lambdas, kappa, color="C2")

ax1.autoscale(enable=True, axis="x", tight=True)
ax2.autoscale(enable=True, axis="x", tight=True)
ax3.autoscale(enable=True, axis="x", tight=True)

# format axes
ax1.format(
    xlabel="Regularization parameter, $\\lambda$",
    ylabel="Residual norm, $\\sqrt{\\rho}$",
    ycolor="C0",
    xformatter="log",
    yformatter="log",
)
ax2.format(
    ylabel="Regularization norm, $\\sqrt{\\eta}$",
    yformatter="log",
)
../../_images/notebooks_non-iterative_L_curve_32_0.png

\(\sqrt{\rho}\) is monotonically increasing with \(\lambda\), while \(\sqrt{\eta}\) is monotonically decreasing with \(\lambda\). This behavior is consistent with the theory of the L-curve criterion.

The curvature of the L-curve is maximized at the center region where both are flat.

Check the relative error

The relative error between the solution \(\mathbf{x}_\lambda\) and the true solution \(\mathbf{x}_\mathrm{true}\) is defined as follows:

\[\begin{equation} \epsilon_\mathrm{rel} = \frac{\|\mathbf{x}_\lambda - \mathbf{x}_\mathrm{true}\|_2}{\|\mathbf{x}_\mathrm{true}\|_2}. \end{equation}\]

Let us seek the minimum \(\epsilon_\mathrm{rel}\) as a function of \(\lambda\).

[14]:
from scipy.optimize import minimize_scalar

X_T_NORM = np.linalg.norm(x_t, axis=0)


def relative_error(
    log_lambda: float,
    x_t: np.ndarray = x_t,
    x_t_norm: float = X_T_NORM,
    svd: SVD = svd,
) -> float:
    """Calculate relative error."""
    beta = 10**log_lambda
    sol = svd.solution(beta=beta)
    return np.linalg.norm(x_t - sol, axis=0) / x_t_norm


# minimize relative error
bounds = -10, -1
res = minimize_scalar(
    relative_error,
    bounds=bounds,
    method="bounded",
    args=(x_t, X_T_NORM, svd),
    options={"xatol": 1.0e-10, "maxiter": 1000},
)

# obtain minimum relative error and lambda
error_min = res.fun
lambda_min = 10**res.x

print(f"minimum relative error: {error_min:.2%} at lambda = {lambda_min:.4g}")
minimum relative error: 4.43% at lambda = 0.0001717

Let us plot the relative error and curvature as a function of \(\lambda\).

[15]:
# set regularization parameters
num = 500
lambdas = np.logspace(*bounds, num=num)

# calculate errors and curvatures
errors = np.asarray([relative_error(log_lambda) for log_lambda in np.linspace(*bounds, num=num)])
kappa = np.asarray([lcurve.curvature(svd, beta) for beta in lambdas])

# create figure
fig, ax1 = uplt.subplots()
ax2 = ax1.alty(color="C1")

# plot errors and curvatures
ax1.loglog(lambdas, errors, color="C0", zorder=0)
ax2.semilogx(lambdas, kappa, color="C1", zorder=0)

# plot minimum error vertical line and point
ax1.axvline(lambda_min, color="r", linestyle="--", linewidth=0.75)
ax1.scatter(lambda_min, error_min, color="r", marker="o", s=10, zorder=2)
ax2.text(
    lambda_min,
    10,
    "$\\lambda_\\mathrm{min}$",
    color="r",
    horizontalalignment="center",
    verticalalignment="bottom",
    border=True,
    zorder=5,
)

# plot maximum curvature vertical line and point
assert svd.lambda_opt is not None
ax1.axvline(
    svd.lambda_opt,
    color="g",
    linestyle="--",
    linewidth=0.75,
)
ax2.scatter(
    svd.lambda_opt,
    lcurve.curvature(svd, svd.lambda_opt),
    color="g",
    marker="o",
    s=10,
    zorder=2,
)
ax2.text(
    svd.lambda_opt,
    10,
    "$\\lambda_\\mathrm{opt}$",
    color="g",
    horizontalalignment="center",
    verticalalignment="bottom",
    border=True,
    zorder=5,
)

ax1.autoscale(enable=True, axis="x", tight=True)
ax2.autoscale(enable=True, axis="x", tight=True)

# format axes
ax1.format(
    ycolor="C0",
    ylim=(1.0e-3, 1),
    xlabel="Regularization parameter, $\\lambda$",
    ylabel="Relative error, $\\epsilon_\\mathrm{rel}$",
    xformatter="log",
    yformatter="log",
)
ax2.format(
    ylabel="curvature of L-curve",
)

error_opt = relative_error(np.log10(svd.lambda_opt))
ax1.format(
    title=f"$\\epsilon_\\mathrm{{rel}}(\\lambda_\\mathrm{{opt}})$ = {error_opt:.2%}, "
    + f"$\\epsilon_\\mathrm{{rel}}(\\lambda_\\mathrm{{min}}) = ${error_min:.2%}"
)
../../_images/notebooks_non-iterative_L_curve_37_0.png

Compare the solution at \(\lambda_\mathrm{opt}\) with \(\lambda_\mathrm{min}\)

\(\lambda_\mathrm{min}\) is the regularization parameter that minimizes the relative error.

[16]:
fig, ax = uplt.subplots()
ax.plot(t, x_t, "k--", label="$\\mathbf{x}_\\mathrm{true}$", linewidth=1.0)
ax.plot(t, svd.solution(lambda_min), label="$\\lambda_\\mathrm{min}$")
ax.plot(t, sol, label="$\\lambda_\\mathrm{opt}$")
ax.legend(ncols=1)
ax.format(
    xlabel="$t$",
    ylim=(0, x_t.max() * 1.1),
)
../../_images/notebooks_non-iterative_L_curve_39_0.png

Discrete Picard Plot

Discrete Picard Condition[1]

The data vector \(\mathbf{b}\) satisfies the discrete Picard condition (DPC) if the data space coefficients \(|\mathbf{u}_i^\mathsf{T}\mathbf{b}|\) decay faster than the singular values \(\sigma_i\).

In ill-posed problems, we find that the DPC holds initially and then fails at some point \(i_\mathrm{DPC}\), where the data become dominated by errors (noise). If this is the case, and if the regularization parameter is accurately selected, then the regularized solution should provide a valid solution. Examining \(i_\mathrm{DPC}\) provides a method of characterizing the ill-posedness of the problem.

[17]:
ub = np.abs(svd.U.T @ b)

fig, ax1 = uplt.subplots()
ax2 = ax1.twinx(ycolor="C4", ylabel="$f_{\\lambda, i}$")

# DP plot
(h1,) = ax1.semilogy(svd.s, ".-", label="$\\sigma_i$")
(h2,) = ax1.semilogy(ub, "o", label="$|\\mathbf{u}_i^\\mathsf{T}\\mathbf{b}|$")
(h3,) = ax1.semilogy(ub / svd.s, "s", label="$|\\mathbf{u}_i^\\mathsf{T}\\mathbf{b}|/\\sigma_i$")
ax1.axhline(1e-4, color="k", linestyle="--", zorder=0)
ax1.axvline(10, color="k", linestyle="--", zorder=0)
ax1.format(xlabel="$i$", yformatter="log", xlim=(0, 25))

# filter plot
assert svd.lambda_opt is not None
(h4,) = ax2.semilogy(
    svd.filter(svd.lambda_opt),
    ".-",
    color="C4",
    label=r"$f_{\lambda_\mathrm{opt}, i}$",
)
ax2.format(
    yformatter="log",
)

fig.legend(
    [h1, h2, h3, h4],
    ncols=1,
    ref=ax1,
    loc="lower left",
);
../../_images/notebooks_non-iterative_L_curve_41_0.png

The vertical dashed line in the above figure marks the beginning of \(|\mathbf{u}_i^\mathsf{T}\mathbf{b}| < \sigma_i\), and the horizontal dashed line represents the noise level. DPC is satisfied for \(i < i_\mathrm{DPC} \simeq 10\). We confirm that the filter factor \(f_{\lambda_\mathrm{opt}, i}\), the \(\lambda_\mathrm{opt}\) of which is selected by the L-curve criterion, starts to decrease around \(i_\mathrm{DPC}\). This behavior works as a filter to suppress the noise component and yields a physically meaningful solution.

References