Matrix Diagonalization by Sampling

It is hard to think of a problem more ubiquitous than the diagonalization of a matrix.I will discuss today a statistical approximation method for finding the eigenvalues of a symmetric matrix.

If you find the way the inline formulas are displayed annoying, please click on this link to see the same post on my old site. Mysteriously, LaTeX works better on wordpress.com than on wordpress.org software

Matrix diagonalization is fundamental to quantum mechanics as well as to most other branch of physics and mathematics. As result considerable effort has been expended on obtaining efficient numerical approximations to the eigenvaues of a matrix. In general, for a matrix of size n this takes {\rm O}(n^3) operations, although in principle {\rm O}(n^{2}\log n) should suffice.If the matrix is sparse-i.e., has very few non-zero elements-much more efficient methods are available.

I have not seen todays method anywhere before, but it was suggested by a paper on the Principal Component Analysis of large rectangular matrices:

Dimitris Achlioptas and Frank McSherry.Fast Computation of Low Rank Matrix Approximations. STOC 2001, pages 611-618.

If anyone knows of this method being used before, please leave a comment or let me know by email: I am new to this subject. I only talk of the basic idea here. If the method works in practice numerically I will post here those results as well. My undergraduate student Jason Robin is checking it out on some examples. Also, I have decided not to spend time giving rigorous proofs until it is clear that the method (i) is new and (ii) works in practice.

Sum Over Paths

Let A be an n\times n symmetric matrix; the spectrum of
A is (unordered, allowing for multiplicities) the multi-set of its eigenvalues \{\lambda_1,\cdots\lambda_n\};i.e.,roots of the characteristic polynomial

P_A(z)=\det[z-A]=\prod_i[z-\lambda_i].

The same information arises as the poles of the function (the Stieljes transform of \rho_A)
G_A(z)={1\over n}{\rm tr}  {1\over z-A}={1\over n}{d\over dz}\log P_A(z).

Another convenient way of packaging this information is as the spectral density

\rho_A(x)={1\over n}\sum_i\delta(x-\lambda_i)

If the eigenvalues remain bounded as $n$ becomes large, this can approach a continuous probability density on the real line. Clearly

 G_A(z)=\int {1\over z-x}\rho_A(x)dx.

Conversely, the spectral density is determined by the poles ( or more generally the discontinuity across the branch cut) of G_A(z).

Yet another way to package the information about the eigenvalues of a matrix are the moments of the spectral density:

\mu_r(A)=\int \rho(x)x^rdx={1\over n}{\rm tr} A^r={1\over n}A_{i_1i_2}A_{i_2i_3}\cdots A_{i_ri_1}.

This has a graphical interpretation. Think of a complete unoriented graph with vertices labelled by $latex\{1,\cdots n\}$; i.e., every unordered pair ij is an edge. Then the last expression is the sum over all closed paths of length r, the contribution of each edge being a factor equal to the matrix element corresponding to it. Or, we can take a `grand canonical’ version: sum over all paths with a weight z^{-1} for each edge. This gives the Stieljes transform of the spectral density which is also the generating function of the moments:

G_A(z)={1\over z}\sum_r\mu_rz^{-r}={1\over nz}\sum_{r=0}^\infty z^{-r}A_{i_1i_2}A_{i_2i_3}\cdots A_{i_ri_1}

Sampling the Matrix

Thus finding the moments of a matrix is the same as averaging over all closed paths in the graph of a matrix. We can now think estimating this average by taking a sample of paths, a common strategy in averaging over large sets. We must sample paths at random such that the probability of picking a path depends only on its length. This is the discrete version of a well known idea in quantum mechanics, the Feynman path integral; also related is the Wiener integral over Brownian paths. The matrices A_{ij} are replaced there by the kernel of the Schr\”odinger or heat equation respectively. In addition to its conceptual elegance, this point of view had led to practically useful approximation methods. The Monte-Carlo simulation methods of lattice gauge theory are sampling methods applied to path integrals.

Even for finite matrices, we might get useful approximation methods.Suppose we take the original matrix and replace each of its entries independently by a random variable a_{ij} which is zero with probability 1-p and equal to {1\over p}A_{ij} with probability p. Thus E[a_{ij}]=A_{ij}. More generally

E[a_{i_1i_2}\cdots a_{i_ri_1}]=A_{i_1i_2}\cdots A_{i_ri_1}

if each edge is distinct from the others. If r remains fixed as n\to \infty, most paths will be non-intersecting in this sense. If we average over many such randomizations, the answer for the moments should therefore be unchanged.

The number of nonzero entries in any sample will be about pn^2. If np=\nu is held fixed as n\to \infty this is a sparse matrix that can be diagonalized in about \nu n steps. This can be repeated a large number N times and the average spectral density determined. Actually often it is the largest (or smallest) eigenvalue that is of interest. In this case we can get a sample of values for these variables. Tracy-Widom theory give us the a priori probability distribution of these random variables and we can use a statistical inference based on it to get a good estimate of the largest eigenvalue of the original matrix.

In fact we could go further. We choose an edge at random and replace the entry there with \pm 1 such that the average is A_{ij}; this can reduce further the memory required to store A.But this saving is a constant factor independent of the size of the matrix, so it may not be worth
it.

One Response to “Matrix Diagonalization by Sampling”

  1. […] Rajeev on a statistical approximation method for matrix diagonalization; there will be updates about the implementation soon. So, keep checking! […]

Leave a Reply

You must be logged in to post a comment.