It is hard to think of a problem more ubiquitous than the diagonalization of a matrix.I will discuss today a statistical approximation method for finding the eigenvalues of a symmetric matrix.

Matrix diagonalization is fundamental to quantum mechanics as well as to most other branch of physics and mathematics. As result considerable effort has been expended on obtaining efficient numerical approximations to the eigenvaues of a matrix. In general, for a matrix of size this takes operations, although in principle should suffice.If the matrix is sparse-i.e., has very few non-zero elements-much more efficient methods are available.

I have not seen todays method anywhere before, but it was suggested by a paper on the Principal Component Analysis of large rectangular matrices:

Dimitris Achlioptas and Frank McSherry.Fast Computation of Low Rank Matrix Approximations. *STOC 2001*, pages 611-618.

* If anyone knows of this method being used before, please leave a comment or let me know by email: I am new to this subject.* I only talk of the basic idea here. If the method works in practice numerically I will post here those results as well. My undergraduate student Jason Robin is checking it out on some examples. Also, I have decided not to spend time giving rigorous proofs until it is clear that the method (i) is new and (ii) works in practice.

**Sum Over Paths **

Let be an symmetric matrix; the spectrum of

is (unordered, allowing for multiplicities) the multi-set of its eigenvalues ;i.e.,roots of the characteristic polynomial

The same information arises as the poles of the function (the Stieljes transform of )

Another convenient way of packaging this information is as the spectral density

If the eigenvalues remain bounded as $n$ becomes large, this can approach a continuous probability density on the real line. Clearly

Conversely, the spectral density is determined by the poles ( or more generally the discontinuity across the branch cut) of .

Yet another way to package the information about the eigenvalues of a matrix are the moments of the spectral density:

This has a graphical interpretation. Think of a complete unoriented graph with vertices labelled by $latex\{1,\cdots n\}$; i.e., every unordered pair is an edge. Then the last expression is the sum over all closed paths of length , the contribution of each edge being a factor equal to the matrix element corresponding to it. Or, we can take a `grand canonical’ version: sum over all paths with a weight for each edge. This gives the Stieljes transform of the spectral density which is also the generating function of the moments:

** Sampling the Matrix **

Thus finding the moments of a matrix is the same as averaging over all closed paths in the graph of a matrix. We can now think estimating this average by taking a sample of paths, a common strategy in averaging over large sets. We must sample paths at random such that the probability of picking a path depends only on its length. This is the discrete version of a well known idea in quantum mechanics, the Feynman path integral; also related is the Wiener integral over Brownian paths. The matrices are replaced there by the kernel of the Schr\”odinger or heat equation respectively. In addition to its conceptual elegance, this point of view had led to practically useful approximation methods. The Monte-Carlo simulation methods of lattice gauge theory are sampling methods applied to path integrals.

Even for finite matrices, we might get useful approximation methods.Suppose we take the original matrix and replace each of its entries independently by a random variable which is zero with probability and equal to with probability . Thus . More generally

if each edge is distinct from the others. If remains fixed as , most paths will be non-intersecting in this sense. If we average over many such randomizations, the answer for the moments should therefore be unchanged.

The number of nonzero entries in any sample will be about . If is held fixed as this is a sparse matrix that can be diagonalized in about steps. This can be repeated a large number times and the average spectral density determined. Actually often it is the largest (or smallest) eigenvalue that is of interest. In this case we can get a sample of values for these variables. Tracy-Widom theory give us the a priori probability distribution of these random variables and we can use a statistical inference based on it to get a good estimate of the largest eigenvalue of the original matrix.

In fact we could go further. We choose an edge at random and replace the entry there with such that the average is ; this can reduce further the memory required to store .But this saving is a constant factor independent of the size of the matrix, so it may not be worth

it.

[…] Rajeev on a statistical approximation method for matrix diagonalization; there will be updates about the implementation soon. So, keep checking! […]