Idiot’s Guide to GMRES (written by and idiot)

I’m doing some programming work for Professor Bindel in the CS department, and it’s pretty novel stuff to me. I figured if I write out what I know about the original method which he is trying to improve on, it’ll help me with the overall picture. Here it is:

The entire premise of GMRES (generalized minimal residual method) is to solve the system of equations like Ax = b where A is a matrices and x, b are vectors with appropriate dimensions. First though, we need some additional machinery before we can cover the main method (don’t worry, the machinery does most of the work).


First some definitions and an overview before nitty-gritty: a Krylov subspace is a magic thing which is quite simple to think about. It’s defined as K_k(A, b) = \text{span}\{b, Ab, A^2b, \dots, A^{k-1}b\}. It’s really nothing but a series of vectors which we get from multiplying A with b lots of times.

What GMRES aims to do is to minimize the residual (A\hat{x} - b where \hat{x} \in K_k(A,b)).

Arnoldi Iterations

I’m going to assume that you seen what the Gram-Schmidt method is; if you don’t recall, it’s the algorithm where given a set of vectors that span a space, you return an orthonormal basis for the subspace which is spans. The Arnoldi iterations are something extremely similar to that, except it’s got a few caveats.

Imagine that we already have a bunch of the vectors from the Krylov subspace from above, but we really can’t work with them. They might be very poorly conditioned in that they’re basically pointing to the same direction (it resembles the power iterations). We basically apply GS on those Krylov vectors to obtain an orthonormal basis. Our end goal is to obtain an expression of the form AV_{k} = V_{k+1} H_n where A was from the matrix defined in our problem and V matrices are the Krylov subspace vectors combined into a matrix. The H is called a upper Hessenberg matrix (i.e. an upper triangular matrix with the first subdiagonal filled in).

The algorithm is as follows (shamelessly copied from wiki):

  • Start with an arbitrary vector ”q”1 with norm 1.
  • Repeat for ”k” = 2, 3, …
  •  q_k \leftarrow Aq_{k-1} \,
  • for j from 1 to k − 1
  •  h_{j,k-1} \leftarrow q_j^* q_k \,
  • q_k \leftarrow q_k - h_{j,k-1} q_j \,
  • endfor
  • h_{k,k-1} \leftarrow \|q_k\| \,
  •  q_k \leftarrow \frac{q_k}{h_{k,k-1}} \,

What does it mean? It’s just like (modified) GS! For each additional vector from the Krylov subspace, you take out the stuff that are orthogonal to vectors already processed and store it in the H matrix. Sounds simple but looks rough…


Now we can actually get to the method (which is simple if you’re still reading). We try to minimize the residual \|Ax_n - b\| (in the Euclidean norm for those wondering), for x \in K_k(A, b) (i.e. the kth Krylov subspace). We can rewrite that as x = V_k y as from our discussion of the Arnoldi iterations above, V_k is the matrices of the Krylov subspace.

Now, we perform some algebra…

\|Ax_k - b\|=\|AV_ky - b\|=\|V_{k+1}H_{k+1}y-b\|=\|H_{k+1}y-V_{k+1}^Tb\| + C

The first equation comes from substitution, the second from using the Arnoldi iterations results, and the last step is kind of tricky. C in our equation is the norm of the projection of b onto the orthogonal complement (i.e. the subspace complement to the space) to the Krylov subspace. Think of this as saying the Pythagorean theorem on the part that lies within the span, and those that lies complement to it.

The last bit is just the observation that the first column of V_{k+1}[\latex] is b, normalized to unit length. So V_{k+1}^Tb=\|b\|e_1[\latex].

ProblemS with gmres

If you notice the Arnoldi iterations, you have more and more vectors to loop over as the Krylov subspace gets larger… and you have to store all those vectors too! This is why people consistently use GMRES with restarts, where they erase all the previous iterations, and use reconstruct the Krylov subspace from the current, closest solution.

The problem with this, is there are situations where convergence to the solution actually depends on the restart! If you don’t choose a good restart value, convergence to the solution might not occur. That’s bad.

Another way to deal with this is to demand the matrix of vectors by well conditioned… but I don’t know much about this. And another way, is to use Chebychev polynomials to somehow do it (more on this later after I read the paper).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.