Close
Type at least 1 character to search
Back to top

Math for Deep Learning IV – Eigenvectors and Eigenvalues

Calculating Eigenvectors and Eigenvalues

This powerful LLM concept is not hard to calculate.

. 44


Introduction

A previous article gave several reasons why we need to learn and practice eigenvalue and eigenvector calculations in order to master deep learning and neural networks.

Deeply understanding the meanings of eigenvalues and eigenvectors is critical.

This article explains what they are and how to use them. Here we’ll go through the following topics:

  • The mathematical foundation of eigenvalue calculations
  • Step-by-step examples with small matrices we can follow by hand
  • Python implementations for computing eigenvalues and eigenvectors
  • How to interpret eigenvalue spectra in real neural networks
  • Practical tools for analyzing our own models

The mathematical foundation of eigenvalue calculations

In the previous article on why we had to learn about eigenvalues and eigenvectors, we talked about how in a group of vectors represented as ribbons, the eigenvector is the one that blows out straight.

Now we are going to explain that with a mathematical foundation. And we will use small matrix examples to drive it home.

Imagine that we have a 4X4 matrix that we are going to use to to transform our neural network. We are starting small — just a 4X4 matrix. That matrix may or may not actually HAVE an eigenvector or an eigenvalue. Or it may have more than one.

Before we get into how to actually calculate these things, we’ll talk about what they are and what conditions our 4X4 matrix must fulfill in order for the eigenvector and eigen value to exist.

And that leads to a very important hint. Something to keep close to our hearts: When a transforming matrix (like our 4X4 matrix) acts on an eigenvector, it does NOT bend or rotate it in any way. It stretches it in a straight line. That’s all. That’s why the eigenvector is the one ribbon that flies straight in the wind when the whole tree is filled with yellow ribbons. And that’s why it’s an important measure of the transforming 4X4 matrix. It shows us the true, core function of the matrix. And that function is to take this specially placed vector, the eigenvector, and stretch it out by a specific scalar value.

Of course that value is the eigenvalue.

Let me say it again: When a transforming matrix acts on an eigenvector, it stretches that eigenvector by a scalar called the eigenvalue.

Let’s do it with symbols: When a transforming matrix that we will call A, acts on an eigenvector that we will call v, it stretches the eigenvector by a scalar that we will call λ (lambda). That scalar λ is also called the eigenvalue.

And here’s the function that shows all of that:

In my opinion it’s a weird formula. It’s very counterintuitive — because at first glance, we want to divide by v and then we are apparently left with A = λ. But don’t let that bother us. Ignore it. Instead, think of the formula like this:

1. On either side of the formula we have a vector called v.

2. On the left side of the formula we are transforming that vector v with another vector (or matrix) called A.

3. On the right side of the formula we see that the result of our transformation of v is to multiply it by a scalar called ƛ.

We are almost done. All that’s left is to understand how we do the calculation.


Part 1. Determining Whether the Transforming Matrix has an Eigenvalue.

Let’s say we have a 2X2 matrix called A:

That matrix A is said to ‘have an eigenvector v’ if we can find values such that

is true. So let’s propose an eigenvector, and then test our matrix to see if that equation holds true. Let’s propose that our matrix has an eigenvector called v:

And from here, we will check whether A times v equals some scalar ƛ times v:

And indeed:

A times v, the result, is the vector :

And that vector can be expressed as a scalar times the original vector:

So we can conclude that our original matrix A does indeed have an eigenvector and its eigenvalue is 3.

Does that mean our matrix, when transforming the unit vector (1,0), will stretch it a magnitude of 3?

Yes. That’s exactly what this means.

Let’s walk through a second example where the eigenvector we propose doesn’t work:

Again, the transforming matrix is the same:

And the eigenvector we propose changes to this:

And the check is the same as before, find a value λ such that:

For the value λ to exist, we need some value such that λ * 2 = 7 and also λ * 1 = 2.

· From the first component: 7 = 2λ → λ = 3.5

· From the second component: 2 = λ → λ = 2

Since λ can’t be both 3.5 and 2 simultaneously, (2, 1) is not an eigenvector of this matrix.

Determining whether an eigenvector is an eigenvector of a specified matrix is not hard.

But we have two more major calculations to learn: If we know the eigenvalue, how do we find the eigenvector? And later, in part 3, if we don’t know anything, how do we find the eigenvectors and eigenvalues? In the next part, we will assume we know the eigenvalue and from this calculate the eigenvector.


Part 2. Calculate the Eigenvector from the Eigenvalue.

In this part, we will find the eigenvector already knowing the eigenvalue. But before we do this, we need to rearrange our equation to a more convenient form.

Here’s our original equation again:

From here we will do some rearranging:

Now let’s start with a new transforming matrix A:

In this example, we stipulate the eigenvalue to be 2.

Let’s find the eigenvector:

And now we want to solve for v from the equation (A — λI)v = 0:

This gives us the system of equations:

Solving:

So from this solved system of equations, we know that if v₂ = 1, then v₁ = -1. Or if v₂ = -3, then v₁ = 3, or any scalar multiple in this case. They are all valid eigenvectors.

That’s how we calculate the eigenvector from the eigenvalue. The trick to remember is to rearrange our original equation to (A — λI)v = 0. Then solve for the vector v through a system of equations.

Let’s go through a second example that also uses the eigenvalue to find the eigenvector. Our new transforming matrix is:

and we stipulate that our eigenvalue is 9.

Finding ‘A minus lambda identity’ is as follows:

Solving for the eigenvector:

gives the solution equations:

From -3v₁ + 2v₂ = 0, we can say

Choose v₁ = 2, then v₂ = 3. And one eigenvector is:

From the other equation 3v₁ — 2v₂ = 0, we can say:

which is the same result.

That’s how we get an eigenvector from a stipulated eigenvalue.

Again, the math is straightforward.

One more lesson. We move on to find the eigenvalue.

From that, of course, we can return here to find the eigenvector. And then we know everything.


Part 3. Calculate the Eigenvalue from the Characteristic Polynomial.

Here’s the general method we will use to find the eigenvalues of a matrix:

1. Form the matrix (A — λI)

2. Calculate the determinant det(A — λI)

3. Set that determinant equal to zero to get our characteristic equation

4. Solve for λ

First, here’s the matrix:

Second, let’s put it in the correct form. This time we don’t know λ.

Now, let’s remember our original equation multiplies our matrix times v and this must equal 0:

So our new equation is this:

Which means that either factor can be zero, and to make the matrix = 0, we get its determinant:

This shows us that the eigenvalues don’t have to play nice and always be integers!

But let’s do another example, with integer eigenvalues, and a slightly different calculation method.

Here’s the matrix:

Here’s the re-write:

And then:

In this example, we factor the characteristic polynomial into two factors and solve them each by setting them equal to zero.

Now that we have the eigenvalues, we can go in and calculate the eigenvectors using the methods in the previous section.

That’s it. Three different but related calculations that give us the basis of all eigenvalue and eigenvector calculations. We should go practice. Even though the truth is we will normally get these values from a code calculation. Which we come to next.


Part 4. Python Implementations for Eigenvectors and Eigenvalues.

To compute these values is easy. We should, of course, know how to do it by hand. But here are standard implementations for NumPy and PyTorch.

NumPy

With numpy, the relevant call is eig(). It’s in the linear algebra package called linalg. We simply present the matrix and get back a tuple with the eigenvalues and eigenvectors:

import numpy as np

# Our matrix
A = np.array([[4, 2],
              [2, 5]])

# Get eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)

print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# Verify: A @ v = λ * v
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lambda_val = eigenvalues[i]
    print(f"\nVerification for λ={lambda_val:.4f}:")
    print(f"Av = {A @ v}")
    print(f"λv = {lambda_val * v}")

Other NumPy functions:

  • np.linalg.eigvals(A) – This gives us the eigenvalues only.
  • np.linalg.eigh(A) – And this for symmetric/Hermitian matrices (more efficient, guarantees real eigenvalues). It returns a tuple of eigenvalues and eigenvectors just like eig(). But it’s much more efficient with symmetric matrices. So it’s much preferred in neural networks, where many matrices are symmetric.

PyTorch

With PyTorch, the function calls are almost identical to NumPy. We again call eig(), and it’s in the ‘linalg’ package of torch. And again we get back a tuple of eigenvalues and eigenvectors:

import torch

A = torch.tensor([[4.0, 2.0],
                  [2.0, 5.0]])

eigenvalues, eigenvectors = torch.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:", eigenvectors)

That’s probably all the python we’ll need to know in order to get these values. But once we have them, there is a world of calculations we can do. We will only scratch the surface here. Every true expert in neural networks will be applying these kind of measurements on a daily basis. So here is our introduction.


Part 5. Eigenvalue Spectra

Eigenvalue spectra reveal crucial information about neural network behavior, training dynamics, and generalization.

Here is only a first exposure to this concept. There will be more detailed articles soon.

Nonetheless, a very simple bit of code can also give a rudimentary understanding of the nature and practical use of eigenvalues.

In the following code, we obtain the weight matrix from a layer in our neural network. We want to evaluate that layer and understand how it contributes to the whole network, so we obtain the eigenvalues, and from these we obtain the spectrum of sorted eigenvalues.

import torch
import numpy as np

# Get weight matrix from a layer
layer = model.layers[0]
W = layer.weight.detach().cpu().numpy()

# Compute eigenvalues
eigenvalues = np.linalg.eigvals(W @ W.T)  # or W.T @ W for tall matrices
spectrum = np.sort(np.abs(eigenvalues))[::-1]

print(f"Spectral norm (largest eigenvalue): {spectrum[0]:.4f}")
print(f"Condition number: {spectrum[0] / spectrum[-1]:.4f}")

How do we interpret these values? Again, this is just an exposure to the topic. But here are some principles:

  • A large spectral norm (>1) suggests that this layer amplifies signals and can lead to potential exploding gradients.
  • A small spectral norm (<1) suggests that this layer dampens signals and can lead to potential vanishing gradients.
  • A high condition number suggests numerical instability, because it shows that some directions have learned much better than others.
  • A large number of near-zero eigenvalues suggests rank collapse will or did happen: This is where the entire layer is using only a few dimensions.

Part 6. Analyze The Network

Even with small model networks, we can perform simple analyses that will help improve our outcomes.

Below find a simple python snippet that we can use to look at our own experimental networks, followed by some explanation of how to interpret the results.

def diagnose_network_health(model, train_loader):
    """Comprehensive eigenvalue analysis"""

    """
    Parameters:
    -----------
    model : torch.nn.Module
        A PyTorch neural network model
        
    train_loader : torch.utils.data.DataLoader
        A PyTorch DataLoader that provides batches of (inputs, labels)
    """
    
    print("="*50)
    print("NEURAL NETWORK EIGENVALUE ANALYSIS")
    print("="*50)
    
    # 1. Weight matrix analysis
    print("\n1. WEIGHT MATRICES:")
    for name, param in model.named_parameters():
        if 'weight' in name and len(param.shape) == 2:
            W = param.detach().cpu().numpy()
            analyze_weight_spectrum(W, name)
    
    # 2. Gradient analysis
    print("\n2. GRADIENT COVARIANCE:")
    grad_spectrum = analyze_gradient_covariance(model, train_loader)
    
    # 3. Loss landscape (if we have PyHessian)
    print("\n3. LOSS LANDSCAPE (Hessian):")
    # hessian_spectrum = compute_hessian_top_eigenvalues(model, train_loader)
    
    # 4. Neural collapse check
    print("\n4. NEURAL COLLAPSE:")
    # check_neural_collapse(model, train_loader)
    
    print("\n" + "="*50)

To use the ‘diagnose_network_health()’ function, we need the following:

  • <strong class="markup--strong markup--li-strong">model</strong> must be an instance of torch.nn.Module or a subclass
  • <strong class="markup--strong markup--li-strong">train_loader</strong> must be a torch.utils.data.DataLoader that yields (inputs, labels) tuples

We can create a DataLoader from NumPy arrays (via TensorDataset), existing datasets (torchvision.datasets, torchtex)and other methods.

What Different Patterns Mean

When all weight eigenvalues are near 1, our network is well-normalized and stable — this is good!

If some weight eigenvalues are much greater than 1, we risk gradient explosion and should consider adding spectral normalization.

When many eigenvalues approach zero, we are experiencing dimensional collapse and may need wider layers or regularization.

If the Hessian has few large eigenvalues, we are in a sharp minimum and should reduce our learning rate.

When the Hessian has many small eigenvalues, we’re in a flat minimum, which is good for generalization!

If the gradient covariance has a concentrated spectrum, we have low-rank learning — this is normal, but watch for training stagnation.

Conclusions

This is a lot of information. We’ve gone from calculating an eigenvector, and eigenvalue to using these concepts to evaluate our real networks. But the main point here is that we need to know how to calculate eigenvectors and eigenvalues. The tools that use this math will be learned more deeply in subsequent articles.


Date
Tags: