Reconstruction Methods

While in a majority of biomedical modalities, images are produced directly, in MRI they are only obtained after a reconstruction process. In this chapter, we present the current approaches to MRI reconstruction. We recall in Section 3.1 the continuous model of the data-collection process in MRI and derive its discretized version. In Section 3.2, we review the different approaches that lead to linear reconstructions and we introduce the key concepts of the inverse problem formalism. This approach maps image reconstruction into an optimization problem with the possibility to impose a priori constraints to distinguish the solution from other possible candidates and improve reconstruction quality. We legitimate in Section 3.3 the use of sparsity-promoting priors in MRI and explain how they can be imposed via a proper regularization term. Finally, we review in Section 3.4 the algorithmic procedures that are theoretically capable of achieving the desired reconstructions while being suited to the practical constraints encountered in MRI.

3.1 Model for MRI

3.1.1 Physics

A radio-frequency pulse is emitted to initiate nuclear magnetic resonance (NMR). It excites the spins in a 2-D plane or a 3-D volume, depending of the type of acquisition format. After excitation, the excited spins behave as radio-frequency emitters and have their precessing frequency and phase modified depending on their positions. This is achieved thanks to the time-varying magnetic gradient fields that are applied during the relaxation, defining a trajectory k in the k-space domain. The modulated part of the signal received by a coil of sensitivity S_i(r) is given by

The signal ρ is referred to as object. This signal is proportional to the spin density, but might also depend upon other local characteristics. More details on the derivation of relation (1) are provided in Chapter 2.

For an array of R receiving coils with sensitivities denoted by S₁⋯ S_R and a k-space trajectory sampled at N points k_n, we represent the measurements concatenated in a global RN× 1 vector

3.1.2 Model for the Original Data

Spatial Discretization of the Object

From here on, we consider that the Fourier domain and, in particular, the sampling points k_n, are scaled to make the Nyquist sampling interval unity. This can be done without any loss of generality if the space domain is scaled accordingly. Therefore, we model the object as a linear combination of pixel-domain basis functions ϕ_p that are shifted replicates of some generating function ϕ, so that

Given a sampled version of the coil sensitivity s_i[p], the sensitivity-weighted object is modeled by

The standard implicit choice for ϕ is Dirac’s delta even if it is hardly justified from an approximation theoretic point of view. Different discretizations have been proposed, for example by Sutton et al. [38] with ϕ as a boxcar function or later by Delattre et al. [39] with B-splines. It is only recently [40] that the details have been worked out to get back the image for general ϕ that are non-interpolating, which is the case for instance for B-splines of degree greater than 1. The image to be reconstructed—i.e., the sampled version of the object ρ(p)—is obtained by filtering the coefficients c[p] with the discrete filter

Since a finite field of view (FOV) determines sets of coefficients c and s_i with a finite number M of elements, we handle them as a vectors c and s_i, keeping the discrete coordinates p as implicit indexing.

Wavelet Discretization

Due to sparsity properties that are discussed later in this chapter, it might be preferable to represent the object in terms of wavelet coefficients. In the wavelet formalism, some constraints apply to ϕ. It must be a scaling function that satisfies the properties for a multiresolution [41]. In that case, the wavelets can be defined as linear combinations of the ϕ_p and the object is equivalently characterized by its coefficients in the orthonormal wavelet basis. We refer to Mallat’s reference book [42] for a full review on wavelets. There exists a discrete wavelet transform (DWT) that bijectively maps the coefficients c to the wavelet coefficients w that represent the same object ρ in a continuous wavelet basis. In the rest of the chapter, we represent this DWT by the synthesis matrix W. Note that the matrix-vector multiplications c = Ww and w = W⁻¹c have efficient filterbank implementations.

3.1.3 Matrix Representation of the Model

The data-formation model (1) and the object parameterization (4) are combined to model the measurement corresponding to every point k_n sampled in k-space. Accordingly, the measurement vector m is related to the coefficients c through the linear relation

with the symbol ⊗ standing for the Kronecker product, I_R representing the R× R identity matrix, and E₀ being the encoding matrix for the same MRI scan with a single homogeneous receiving coil

Due to the presence of noise and other scanner inaccuracies, the introduction of a new term b, accounting for an additive perturbation, makes the data-formation model

more realistic. Equivalently, if the parameters of interest are the wavelet coefficients w, the model writes

In MRI, the major source of noise is a radio-frequency signal originating from the thermal motion in the object under investigation. When observed with a receiving array of coils, this noise presents non-negligible correlations across channels. In other terms, the R× R channel cross-correlation matrix Θ has non-null off-diagonal entries. Accordingly, the additive perturbation is generally modeled as the realization of a centered multivariate Gaussian process b∼ N(0, Ψ) with covariance matrix Ψ=Θ⊗I_N.

3.2 Linear Solutions

The problem of imaging is to recover the M coefficients c (or equivalently w) from the N corrupted measurements m. In this section, we review the popular approaches that lead to reconstructions that depend linearly upon the observations. We show that they are functionally equivalent. Two of these approaches rely on a stochastic interpretation of the problem, where the matrices Ψ and Υ are the known covariance matrices of the noise b and the object c, respectively. The corresponding global variances are given by v_n = Tr(Ψ)/N=Tr(Θ) and v_s = Tr(Υ)/M. We define the normalized covariance matrices as Ψ₀=Ψ/v_n and Υ₀=Υ/v_s. Most linear solutions involve a balancing parameter λ which is necessarily positive and can be interpreted in terms of the signal-to-noise ratio λ⁻¹=Tr(Υ)/Tr(Ψ)=M v_s/(N v_n).

3.2.1 Pseudoinverse

Depending on the scanner settings, the encoding matrix E is generally neither square nor invertible. In such cases, the Moore-Penrose pseudoinverse offers a solution to the reconstruction problem. The reconstruction matrix is then defined as

The Hermitian matrix transpose is used, denoted by the superscript ^H, because in MRI matrices have complex-valued entries. The problem of inverting a non-square matrix is tackled by considering the backprojected¹ problem

Considering the singular value decomposition E=UΣV^H, where Σ is an RN× M matrix whose diagonal entries are the singular values σ_n, one gets E^†=VΣ^†U^H, with singular values

The major concern with pseudoinverse reconstruction resides in the propagation of noise. Indeed, very small but non-null singular values lead to drastic amplification of the corresponding noise components. This effect is quantified by the condition number, defined as κ(E)=max_n σ_n/min_n σ_n. This number, which is greater or equal to 1, is also representative of the numerical challenge faced when inverting E. A linear inverse problem is termed “ill-conditioned” when the corresponding condition number is large. When the null space of E is not limited to {0}, the problem is said to be ill-posed.

The aim of regularized reconstruction schemes is to improve reconstruction with respect to the pseudoinverse approach by limiting the propagation of noise in the images.

It is remarkable that (12) rewrites as ∇_c(||m−Ec ||₂²)=0, with ∇_c standing for the gradient operator with respect to c. The More-Penrose pseudoinverse provides a least-squares solution c^⋆=E^†m to the reconstruction problem because it ensures E^Hm=E^HEc^⋆. This least-squares solution makes sense when the noise term b is independent and identically distributed.

Instead, when the noise correlation matrix Ψ₀ is available, this knowledge can be exploited using the weighted pseudoinverse

with the weighting matrix X=Ψ₀^†. The interest of that type of solution is that it takes into account noise correlations and that it relies less on the noisier samples. Thanks to the relation E^HXEE_X^†m=E^HXm, the weighted pseudoinverse provides a (weighted) least-squares solution.

3.2.2 Quadratic Regularization

The approach proposed by Phillips [43] and Twomey [44], for finite dimensional problems, and by Tikhonov [45], for infinite dimensional problems, defines the reconstruction as the minimization of the functional

where the notation ||· ||_X with X positive-definite stands for a weighted norm such that ||v ||_X²=v^HXv. The functional is a trade-off between a fidelity term, which enforces consistency with the measurements, and a regularization term, which penalizes non-regular solutions with respect to the regularization matrix R. The tuning parameter λ balances the influence of these two terms. The role of the regularization term is to limit the amplification of noise that can be dramatic for ill-conditioned problems (in MRI, see for instance [46]). In practice, it is often designed with a derivation operator to favor smooth solutions. Similar to the weighted pseudoinverse solution, the weighting matrix can be chosen as X=Ψ₀^† yielding a reconstruction matrix that gives importance to the samples in inverse proportion to their level of noise. Another common choice is to take X diagonal such as to compensate for an inhomogeneous k-space sampling density [37]. This choice facilitates the reconstruction.

The minimization of a quadratic functional yields a linear solution. Indeed, by taking the gradient of the functional and setting it to zero, we find that the reconstruction matrix writes

3.2.3 Maximum a posteriori

Here, the reconstruction problem is tackled within a stochastic framework. The unknowns c and b are modeled as realizations of centered multivariate Gaussian distributions: c∼ N(0, Υ) and b∼ N(0, Ψ).

According to the numerical model (9), the measurements also follow a multivariate Gaussian distribution m∼ N(0, EΥE^H+Ψ).

The maximum a posteriori solution (MAP) c is the vector that maximizes the posterior distribution given the measurements m. Using Bayes’ theorem, the probability density function of the posterior distribution of c writes

In the present stochastic setting, the probability density function can be expanded in

We introduced the normalized covariance matrices in the later expression in order to have the parameter λ, which is the inverse of the signal to noise ratio, appear explicitly.

Similarly to the previous approaches, the functional to be minimized is composed of quadratic terms. As a consequence, the solution is linear, characterized by the reconstruction matrix

3.2.4 Linear Minimum Mean Squared Error estimator

The Gaussian model used in the MAP approach is hardly justified for true MRI images c. This assumption can be substituted by the constraint that the reconstruction is affine with respect to the measurements. Accordingly, we write the reconstructed image Fm+g.

To determine adequate parameters F and g, one can rely on the two first order statistics of the unknown data; that are, the expectation vectors c and b, and covariance matrices Υ and Ψ. According to the data-formation model (9), the expectation and covariance of the reconstruction error e=Fm+g−c are given by

An unbiased reconstruction² is obtained when g=c−F(Ec+b). For the choice of F, one would reasonably like to minimize the variance of the reconstruction error. Given that the estimator is unbiased, the variance also corresponds to the expectation of the mean-square error. It is is given by the trace of the covariance matrix

The matrix F that minimizes the error variance, also referred to as mean-square error, can be computed using matrix calculus. Using the normalized covariance matrices, it writes

3.2.5 Connections

First, Equations (15) and (18) show that quadratic regularization and MAP approaches are equivalent provided that X=Ψ₀^† and Υ₀^†=R^HR.³

Second, the three following equalities reveal the connection between MAP and LMMSE solutions, (18) and (22), in the case where both matrices Υ₀ and Ψ₀ are invertible:

Last, the weighted pseudoinverse solution with X=Ψ₀^† corresponds to the other solutions in the limiting case where λ tends to 0. This is also the case for the regular Moore-Penrose pseudoinverse when the noise is independent and identically distributed; that is to say Ψ₀=I_RN/R. As already mentioned, the pseudoinverse solutions are only valid when noise propagation is negligible. This situation occurs with well-conditioned (κ(E)≈ 1) reconstruction problems that are largely overdetermined (M≪ RN) and/or subject to very little noise (Tr(Υ)≫Tr(Ψ)).

About the invertibility of Ψ₀ and Υ₀:

There is no particular reason for Ψ₀ to be singular. Most of the time, the correlation between pixels in the image are not modeled; this translates in a matrix Υ₀ which is diagonal. When no signal is expected from some pixels of the image, (for instance outside a predetermined ROI), it could be tempting to set to 0 the corresponding entries in Υ₀, resulting in a singular matrix. However, a reasonable problem setting would exclude such entries in the unknown vector c, restoring the invertibility of Υ₀.

3.3 Non-Quadratic Regularizations

We just saw that the linear approaches to reconstruction can be derived from the solution of some optimization problems. The corresponding functionals were quadratic, yielding closed-form solutions. In this section, we consider other approaches that are popular in MRI and which involve non-quadratic regularization terms.

3.3.1 Inverse Problem Formalism

The solution c^⋆ is defined as the minimizer of a cost function that involves two terms: the data fidelity F(b) and the regularization R(c) that penalizes undesirable solutions. This is summarized as

where the regularization parameter λ≥ 0 balances the two constraints. In MRI, the noise term b=m − Ec is usually assumed to be the realization of a Gaussian process with normalized covariance matrix Ψ₀. From a Bayesian point of view, this justifies the choice F(b)=||b ||_Ψ₀^†²=b^HΨ₀^†b as a proper log-likelihood term. A more practical motivation for this choice is that a quadratic fidelity term yields a simple closed-form gradient that greatly facilitates the design and performance of reconstruction algorithms.

When the k-space sampling is dense enough and the signal-to-noise ratio is high, the quadratic regularization terms (presented in the previous section) yield satisfying reconstructions. But, the constraints to reduce the scan duration favor setups with reduced SNR and k-space trajectories that present regions of low sampling density. In these situations where the reconstruction problem is more challenging, the reconstructed image can often be enhanced by the use of a more suitable regularization term R(c).

3.3.2 Total Variation

Total Variation (TV) was introduced as an edge-preserving denoising method by Rudin et al. [47]. It is now a very popular approach to tackle image enhancement problems.

The TV regularization term corresponds to the sum of the Euclidean norms of the gradient of the object. In practice, it is defined as R(w)=||∇c ||_ℓ₁. In this context, the operator ∇ returns pixelwise the ℓ₂-norm of finite differences. The use of TV regularization is particularly appropriate for piecewise-constant objects such as the Shepp-Logan (SL) phantom used for simulations in tomography and MRI. Textured and noisy images exhibit a much larger total variation.

Sparsity-Promoting Regularization

Another popular idea is to exploit the fact that the object can be well represented by few non-zero coefficients (sparse representation) in an orthonormal basis of M functions φ_p. Formally, we write that

It is well-documented that typical MRI images admit sparse representation in bases such as wavelets or block DCT [6]. We illustrate this property in Figure 3.1.

The ℓ₁-norm is a good measure of sparsity with interesting mathematical properties (e.g., convexity). Thus, among the candidates that are consistent with the measurements, we favor a solution whose wavelet coefficients have a small ℓ₁-norm. Specifically, the solution is formulated as

This is the general solution for wavelet-regularized inverse problems considered by [19] as well as by many other authors.

3.4 Algorithms

MRI gives rise to a large-scale inverse problem in the sense that the number of degrees of freedom—that is to say, the unknown pixel values—is large. Consequently, the matrices are generally too large to be stored in memory not to mention the fact that direct matrix multiplication involves too many operations.⁴ We summarize in this section the strategies that make the reconstruction in MRI feasible with reasonable computer requirements and acceptable computation times.

3.4.1 Matrix-Vector Multiplications

The matrix-vector multiplications y=E₀x and y=E₀^Hx are two basic operations in MRI reconstruction. They can be implemented efficiently using the FFT algorithm. For non-Cartesian samples k_n, the gridding method, based on FFT and interpolation, can provide accurate computations (see [48] for instance). Algorithms 1 and 2 describe the implementation of the operations y=E₀x and y=E₀^Hx, respectively.

Algorithm 1 Matrix-Vector multiplication y=E₀x, according to (8).
Inputs: x, p₁,…, p_M, k₁,…, k_N, and ϕ^{^}

y← Gridding(x, p₁,…, p_M, k₁,…, k_N) (going to k-space domain)
For n← 1 to N
- y_n ← ϕ^{^}(2πk_n)y_n

Return: y

Algorithm 2 Matrix-Vector multiplication y=E₀^Hx, according to (8).
Inputs: x, k₁,…, k_N, p₁,…, p_M, and ϕ^{^}

y← Gridding(x, p₁,…, p_M, k₁,…, k_N) (going to k-space domain)
For n← 1 to N
- x_n ← ϕ^{^}(2πk_n)^*x_n (superscript ^* indicates the conjugate transpose)
y← Gridding(x, −k₁,…, −k_N, p₁,…, p_M) (going to spatial domain)

Return: y

An interesting work by Wajer [49] identifies E₀^HE₀ as a convolution matrix associated to the kernel

When the kernel is precomputed for the lattice points belonging to the set S={ p−q ∣ p∈FOV, q∈FOV}, one can avoid the use of Algorithms 1 and 2. An efficient implementation of the operation y=E₀^HE₀x, which uses zero-padded multidimensional FFTs, is described in Algorithm 3.

Algorithm 3 Matrix-Vector multiplication y=E₀^HE₀x, according to (8) and (26).
Inputs: x, S and G;

Precompute G← FFT(G) (DFT coefficients)
x← ZPAD(x,S) (zero-padding x to the dimensions of G)
x← FFT(x) (computing DFT coefficients)
For n← 1 to number of elements of G
- x_n ← G_n x_n (multiplication of DFTs)
x← IFFT(x) (inverse DFT)
For p∈ FOV
- y[p] ← x[p]

Return: y

Most of the time, in parallel MRI, the covariance matrices are block diagonal. In that case, they are sparse matrices and one can benefit from the related efficient memory storage and matrix operations. As already mentioned, Ψ₀ is fully characterized by the channel cross-correlation matrix Θ₀=Θ/v_n such that Ψ₀=Θ₀⊗I_N. Its pseudoinverse or inverse is then given by Ψ₀^†=Θ₀^†⊗I_N. The matrix-vector multiplications with E^HΨ₀^† and E^HΨ₀^†E are implemented as described in Algorithms 4 and 5, respectively.

Algorithm 4 Matrix-Vector multiplication y=E^HΨ₀^†x, according to (7).
Inputs: x, s₁,…, s_R, Θ₀^† and E₀^H

(x₁,…,x_R)← x
For r← 1 to R
- x_r ← E₀^Hx_r (using Algorithm 2)
y ← 0
For r← 1 to R
- y_r ← 0
- For r′← 1 to R
  - y_r ← y_r + [Θ₀^†]_r,r′x_r′
- For p ∈ FOV
  - (y_r)[p] ← (s_r)^*[p](y_r)[p]
- y ← y + y_r

Return: y

Algorithm 5 Matrix-Vector multiplication y=E^HΨ₀^†Ex, according to (7).
Inputs: x, s₁,…, s_R, Θ₀^† and E₀^HE₀

For r← 1 to R
- For p ∈ FOV
  - (x_r)[p] ← (s_r)[p]x[p]
- x_r ← E₀^HE₀x_r (using Algorithm 3)
y ← 0
For r← 1 to R
- y_r ← 0
- For r′← 1 to R
  - y_r ← y_r + [Θ₀^†]_r,r′x_r′
- For p ∈ FOV
  - (y_r)[p] ← (s_r)^*[p](y_r)[p]
- y ← y + y_r

Return: y

3.4.2 Conjugate Gradient

The conjugate gradient method (CG) [50] is an iterative algorithm that is among the most efficient in solving large-scale linear problems Ac=b, characterized by symmetric and positive-definite matrices A. The only operations involving the matrix A are matrix-vector multiplications Ax. In parallel MRI, it is the method of reference [37] to perform linear reconstructions. The quadratic-regularized solution characterized by the reconstruction matrix in (15) is computed with CG solving the linear problem defined by the matrix A=E^HXE+λR^HR and vector b=E^HXm.

The idea of the method is to decompose the solution in a basis of mutually conjugate vectors; that is to say c=∑_i α_ip_i, with p_i^HAp_j=0 for i≠ j. At iteration i, the estimate is c_i=∑_{j≤ i}α_jp_j and the corresponding residue writes r_i=b−Ac_i. For the next direction, the choice p_i+1=r_i−∑_{j≤ i}(p_j^HAr_i)p_j/||p_j ||_A ensures the conjugacy constraint. In this direction, the coefficient α_i+1=Re(p_i+1^HAr_i)/||p_i+1 ||_A is optimal with respect to the cost C(c)=c^HAc−c^Hb−b^Hc. An efficient implementation of the method is described in Algorithm 6.

Algorithm 6 CG solving Ac=b with A symmetric and positive-definite.
Inputs: A, b, and c₀ (optional, default: c₀=0)
Initialization: r₀=b−Ac₀, p₀=r₀, and i=0
Repeat until desired tolerance is reached:

q_i ← Ap_i
α_i ← r_i^Hr_i/(p_i^Hq_i)
c_i+1 ← c_i + α_ip_i
r_i+1 ← r_i − α_iq_i
p_i+1 ← r_i+1 + r_i+1^Hr_i+1/(r_i^Hr_i)p_i
i ← i+1

Return: c_i

The CG algorithm theoretically converges within a finite number of iterations. In practice, this result is compromised by the propagation of round-off errors. In the context of MRI, the property of practical interest is the linear convergence rate achieved by CG. Indeed, the distance to the desired solution decreases as a power of the iteration number, with the convergence rate

When the condition number κ(A) is large, the rate r(A) gets close to the unity, characterizing a slower convergence. Using the weighted norm ||x ||_A=√x^HAx, the distance is upperbounded by

3.4.3 Iteratively Reweighted Least-Squares

The Iteratively Reweighted Least-Squares algorithm (IRLS), which is also known as the positive form of half-quadratic minimization [51], can be used to compute the solutions defined as

In this context, the functional is strictly convex for p>1. This condition ensures the unicity of the minimizer.

The principle of IRLS is to design an upperbounding quadratic proxy for the regularization term, tailored to the neighborhood of c_i. In practice, one chooses the functional

where D_i is a diagonal matrix with entries |(Rc_i)_n|^p−2. It has the following desirable properties

Algorithm 7 IRLS solving c^⋆=arg min_c||m−Ec ||_X² + λ||Rc ||_ℓ^p^p.
Inputs: A=E^HXE, a=E^HXm, R, p, λ, and c₀

i← 0

Repeat until desired tolerance is reached:

r_i=Rc_i
For n← 1 to number of elements of r_i
- δ_n← (λ p/2)|(r_i)_n|^p−2
A_i← A+R^Hdiag([δ₁,…,δ_n,…])R
c_i+1 ← CG(A_i,a,c_i) (using Algorithm 6)
i ← i+1

Return: c_i

Let us remember that for p≤ 1 the minimization problem might not admit a unique solution. When the minimizer c^⋆ is unique, it is also the unique fixed-point of the algorithm. As long as p<2, the sequence of functional values C(c_i)=||m−Ec_i ||_X² + λ||Rc_i ||_ℓ^p^p generated by the IRLS is monotonically decreasing. This guarantees the convergence since the sequence is lower-bounded by the finite quantity C^⋆=min_c||m−Ec ||_X² + λ||Rc ||_ℓ^p^p.

The IRLS algorithm can be simply adapted in order to solve the minimization with mixed-norm regularization terms. A particular case is the total variation penalty which corresponds to the ℓ₁-norm of the pixel-wise ℓ₂-norm of the spatial gradient [52]. The IRLS algorithm for TV regularization was first proposed by Wohlberg and Rodríguez [53]. It is described in Algorithm 8.

Algorithm 8 IRLS solving c^⋆=arg min_c||m−Ec ||_X² + λ||c ||_TV.
Inputs: A=E^HXE, a=E^HXm, λ, and c₀
Define the finite difference matrices R_d along every spatial dimension d

i← 0

Repeat until desired tolerance is reached:

For d← 1 to number of spatial dimensions
- r_d←R_dc_i
For n← 1 to number of elements of c_i
- δ_n← λ/(2√∑_d |(r_d)_n|²)
A_i← A+∑_d R_d^Hdiag([δ₁,…,δ_n,…])R_d
c_i+1 ← CG(A_i,a,c_i) (using Algorithm 6)
i ← i+1

Return: c_i

Duality-based algorithms proved to be an efficient alternative to achieve TV regularization [54,55].

3.4.4 Iterative Shrinkage/Thresholding Algorithm

The Iterative Shrinkage/Thresholding Algorithm (ISTA)[18,17,19], also known as thresholded Landweber (TL), aims at minimizing the functional

Here, we use the notation w because ISTA is often applied on wavelet coefficients.

An important observation to understand ISTA is to see that the nonlinear shrinkage operation, sometimes called soft-thresholding, solves a minimization problem[56], with

This means that the ℓ₁-regularized denoising problem (i.e., when M and X are identity matrices) is precisely solved by a shrinkage operation.

The ISTA generates a sequence of estimates w_i that converges to the minimizer w^⋆ of (31) when it is unique. The idea is to define at each step a new functional C′(w,w_i) whose minimizer w_i+1 will be the next estimate

In accordance with Constraint 1, C′ can take the generic quadratically augmented form

with the constraint that (Λ−M^HXM) is positive definite, where the weighting matrix Λ plays the role of a tuning parameter.

Then, ISTA corresponds to the trivial choice Λ=L/2I, with the value of L chosen to be greater or equal to the Lipschitz constant of the gradient of ||Mw ||_X², so that L≥ 2λ_max( M^HXM).

Note that both the intermediate variable z_n in (35) and the threshold values will vary depending on L.

Algorithm 9 ISTA solving w^⋆=arg min_w||m−Mw ||_X² + λ||w ||_ℓ₁.
Inputs: A=M^HXM, a=M^HXm, w₀, and L

i← 0

Repeat until desired tolerance is reached:

z_i+1 ← w_i+ 2(a−Aw_i)/L
w_i+1 ←T_2λ/L(z_i+1)
i ← i+1

Return: w_i

Beck and Teboulle[20, Thm. 3.1] showed that this algorithm decreases the cost function in direct proportion to the number of iterations i.

Proposition 1 Let { w_i} be the sequence generated by Algorithm 9 with L≥ 2λ_max( A). Then, for any i>i₀∈ ℕ,

C(w_i)− C(w^⋆)≤

2(i−i₀)

⎪⎪
⎪⎪

w_i₀−w^⋆

⎪⎪
⎪⎪

_ℓ₂². (36)

Selecting L as small as possible will clearly favor the speed of convergence. It also raises the importance of a “warm” starting point.

Among the variants of ISTA, FISTA, proposed by Beck and Teboulle[20], ensures state-of-the-art convergence properties while preserving a comparable computational cost. Thanks to a controlled over-relaxation at each step, FISTA quadratically decreases the cost function, with

More details on FISTA, as a particular case of FWISTA with the trivial choice Λ=L/2I, can be found in Section 5.2.3.

Algorithm 10 FISTA solving w^⋆=arg min_w||m−Mw ||_X² + λ||w ||_ℓ₁
Inputs A=M^HXM, a=M^HXm, w₀, and L Initialization: i=0, v₀ = w₀, t₀=1
Repeat until desired tolerance is reached:

w_i+1 ← T_2λ/L(v_i+2(a−Av_i)/L) (ISTA step)
t_i+1 ← (1+√1+4 t_i²)/2
v_i+1 ← w_i+1 + (t_i−1)(w_i+1−w_i)/t_i+1
i ← i+1

Return: w_i

Backprojection is a term used in tomography that refers to the multiplication by the transpose of the encoding matrix.

Let us mention that Υ being a covariance matrix, it is necessarily Hermitian symmetric. Its pseudoinverse is also Hermitian symmetric and admits the same eigenvectors.

Take a single channel MRI problem with M=256× 256 unknowns and N=256× 256 measurements. The corresponding encoding matrix E is 65 536× 65 536. With double precision floats that are required for accurate calculations, the storage of this complex-valued matrix would require 64 GiB of RAM memory which is a too large value for current personal computers. Performing direct matrix-vector multiplications involves an overwhelming amount of scalar multiplications and additions.

Chapter 3 Reconstruction Methods

3.1 Model for MRI

3.1.1 Physics

3.1.2 Model for the Original Data

Spatial Discretization of the Object

Wavelet Discretization

3.1.3 Matrix Representation of the Model

3.2 Linear Solutions

3.2.1 Pseudoinverse

3.2.2 Quadratic Regularization

3.2.3 Maximum a posteriori

3.2.4 Linear Minimum Mean Squared Error estimator

3.2.5 Connections

About the invertibility of Ψ₀ and Υ₀:

3.3 Non-Quadratic Regularizations

3.3.1 Inverse Problem Formalism

3.3.2 Total Variation

Sparsity-Promoting Regularization

3.4 Algorithms

3.4.1 Matrix-Vector Multiplications

3.4.2 Conjugate Gradient

3.4.3 Iteratively Reweighted Least-Squares

3.4.4 Iterative Shrinkage/Thresholding Algorithm

Chapter 3 Reconstruction Methods

3.1 Model for MRI

3.1.1 Physics

3.1.2 Model for the Original Data

Spatial Discretization of the Object

Wavelet Discretization

3.1.3 Matrix Representation of the Model

3.2 Linear Solutions

3.2.1 Pseudoinverse

3.2.2 Quadratic Regularization

3.2.3 Maximum a posteriori

3.2.4 Linear Minimum Mean Squared Error estimator

3.2.5 Connections

About the invertibility of Ψ0 and Υ0:

3.3 Non-Quadratic Regularizations

3.3.1 Inverse Problem Formalism

3.3.2 Total Variation

Sparsity-Promoting Regularization

3.4 Algorithms

3.4.1 Matrix-Vector Multiplications

3.4.2 Conjugate Gradient

3.4.3 Iteratively Reweighted Least-Squares

3.4.4 Iterative Shrinkage/Thresholding Algorithm

About the invertibility of Ψ₀ and Υ₀: