The L-curve criterion is proposed by Hansen [1][2].
Let us consider the same ill-posed linear inverse problem introduced in the Theory of inversion problem:
This curve is monotonically decreasing varying \(\lambda\) from \(0\) to \(\infty\).
The L-curve criterion gives a way to choose the optimal regularization parameter \(\lambda\) by finding the corner of the L-curve plotted in the log-log scale in figure below.
The reason way the corner of the L-curve is optimal is discussed in the below section.
The schematic diagram of the L-curve. The dot on the curve represents the corner of the L-curve, which is the point where the curvature is maximal.¶
where the prime denotes the derivative with respect to \(\lambda\).
If \(\kappa(\lambda) > 0\), the L-curve is convex at \(\lambda\), and if \(\kappa(\lambda) < 0\), the L-curve is concave at \(\lambda\).
Before expressing \(\hat{\rho}'\), \(\hat{\eta}'\), … etc., the following calculation is useful:
where \(\mathbf{x}_0 = (\mathbf{T}^\mathsf{T}\mathbf{Q}\mathbf{T})^{-1}\mathbf{T}^\mathsf{T}\mathbf{Q}\mathbf{b}\), which is the least-squares solution.
Proof
The filter factor \(f_{\lambda, i}\) is expressed as follows:
where \(\bar{\mathbf{b}}\) represents the exact unperturbed data,
\(\bar{\mathbf{x}}\) represents the exact solution,
and \(\mathbf{e}\) represents the errors in the data.
Assumptions
Assuming the following conditions:
\(|\mathbf{u}_i^\mathsf{T}\mathbf{B}\bar{\mathbf{b}}|\) decay faster than \(\sigma_i\). (Discrete Picard condition (DPC))
\(\mathbf{e}\) is the white noise.
Sufficient SNR (Signal-to-Noise Ratio) is given, i.e. \(\|\bar{\mathbf{b}}\|/\|\mathbf{e}\| \gg 1\).
Then the L-curve has the corner where the residual norm \(\|\mathbf{T}\mathbf{x}_\lambda - \mathbf{b}\|_\mathbf{Q}\) is approximated to be equal to \(\|\mathbf{e}\|_\mathbf{Q}\).
According to the first condition, \(\frac{\mathbf{u}_i^\mathsf{T}\mathbf{B}\bar{\mathbf{b}}}{\sigma_i}\) does not become large as \(i\) increases, while \(\frac{\mathbf{u}_i^\mathsf{T}\mathbf{B}\mathbf{e}}{\sigma_i}\) becomes large because it does not satisfy the DPC. So, the \(\eta\) is dominated by the second term in \(\lambda \ll 1\).
Increasing \(\lambda\), the \(\eta\) decreases because the high-frequency components of the second term are suppressed by the \(f_{\lambda, i}\), then the \(\eta\) is dominated by the first term where the L-curve is horizontal.
Somewhere in between, there is a range of \(\lambda\)-values that correspond to a transition between the two domination L-curves.
When we find the L-curve corner numerically, it is important to set the range of \(\lambda\).
Regińska proved that
Theorem
The log-log L-curve is always strictly concave for
\[ \sigma_r^2\leq \lambda\leq\sigma_1^2, \]
where \(\sigma_1\) and \(\sigma_r\) are the largest and smallest singular values, respectively [3].
Hansen also presented the reason using the curvature expression (4) and modeling \(|\mathbf{u}_i^\mathsf{T}\hat{\mathbf{b}}|\) as a power-law function of \(\sigma_i\) at Section 6 in [2].