Autograd for sparse matmul: getting either cuda memory leak or 'buffers have already been freed' error

Hi,

I need to do a multiplication with a fixed sparse matrix. Since the matrix is fixed, i don’t need the gradient wrt to that matrix, but only wrt the other matrix. Since there’s no autograd for sparse matrices yet, I implemented it like this:

class LeftMatMulSparseFixedWeights(torch.autograd.Function):
"""
Implementation of matrix multiplication of a Sparse Variable with a Dense Variable, returning a Dense one.
This is added because there's no autograd for sparse yet. No gradient computed on the sparse weights.
"""

def forward(self, sparse_weights, x):
    self.save_for_backward(sparse_weights)
    return torch.mm(sparse_weights, x)

def backward(self, grad_output):
    sparse_weights, = self.saved_tensors
    return None, torch.mm(sparse_weights.t(), grad_output)

This “works”, but I get into trouble anyway:
If I create one function object in the enclosing Module, like so:

class FixedSparseLinMod(nn.Module):
"""
A module that reads a sparse matrix from a file and does the left matrix multiplication.
Typical usage is a terms-class matrix for zero-shot learning.
"""
def __init__(self, sparse_mat_file):
    super(FixedSparseLinMod, self).__init__()
    dims, inds, vals = read_sparse_tensor(sparse_mat_file)
    i = torch.LongTensor([[x[0] for x in inds], [x[1] for x in inds]])
    v = torch.FloatTensor(vals)
    s = torch.Size([len(dims[0]), len(dims[1])])
    self.sparse_mat = nn.Parameter(torch.sparse.FloatTensor(i, v, s), requires_grad=False)
    self.matmul = LeftMatMulSparseFixedWeights()

and then use the following in the forward pass:

    def forward(self, x):
        return self.matmul(self.sparse_mat, x.t()).t()

Then I get this:

in backward
sparse_weights, = self.saved_tensors
RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please 
specify retain_variables=True when calling backward for the first time.

I also confirmed that forward and backward work fine for the first batch, but the backward pass throws for the second batch. Ok, so looks like the shared self.matmul is cleaning up some state in a weird way after each pass. So my first question is: “What’s getting freed here? Why isn’t the sparse_weights state saved correctly during the second forward?”

I tried working around it by simply creating a new LeftMatMulSparseFixedWeights function object each mini-batch, by doing this in the enclosing Module:

    def forward(self, x):
        return LeftMatMulSparseFixedWeights()(self.sparse_mat, x.t()).t()

This works, but creates a memory leak, which causes the process to crash because I run out of CUDA memory after some time.

I’m thinking the first approach is the correct one, but I can’t find a way to not get that error. Even for the second approach, though, i don’t really understand the memory leak: Isn’t the function object automatically freed?

Help very much appreciated!

3 Likes

Hi,

When I try your tourch.mm with the following inputs:

class SPMM(torch.autograd.Function):
""“
Implementation of matrix multiplication of a Sparse Variable with a Dense Variable, returning a Dense one.
This is added because there’s no autograd for sparse yet. No gradient computed on the sparse weights.
”""
def forward(self, sparse_weights, x):
self.save_for_backward(sparse_weights)
return torch.mm(sparse_weights, x)

def backward(self, grad_output):
    sparse_weights, = self.saved_tensors
    return None, torch.mm(sparse_weights.t(), grad_output)

dim = 5
class NN(nn.Module):
def init(self, dim = dim):
super(NN, self).init()
self.dim = dim
self.A = torch.autograd.Variable(random_sparse(n = dim))
self.w = torch.autograd.Variable(torch.Tensor(np.random.normal(0,1,(dim,dim))))

    self.fc1 = nn.Linear(dim, dim)
    self.fc2 = nn.Linear(dim, dim)
    self.X = torch.autograd.Variable(torch.eye(dim))
    
    self.SPMM = SPMM()
    
def f(self):
    return self.SPMM.forward(self.A, self.w)

I get the following error message:
TypeError: Type torch.sparse.FloatTensor doesn’t implement stateless method addmm

Any idea how this might be circumvented? Driving me insane…

1 Like

are you sure you’re on the latest pytorch? I think a bunch of methods were added for sparse in 0.1.12. Anyway, looks pretty much the same as what does work for me (with memory leak), except that I don’t call the forward() method explicitly, but create a new function object and call it immediately. I don’t think that would explain the error you get, though.

oh, strike that: if I replace the Function object function call by explicit use of forward() (which I think you shouldn’t do anyway…), I get your error. So try that…

Coming back to the original question: I hacked around it for now by not using the self.save_for_backward mechanism, but instead creating a “self.sparse_weights” and setting it to None in the constructor, and doing this in forward:

    def forward(self, sparse_weights, x):
        if self.sparse_weights is None:
            self.sparse_weights = sparse_weights
        return torch.mm(self.sparse_weights, x)

Of course this is a terrible hack, and it completely rules out doing some backprop on that fixed matrix at some point. But at least it doesn’t leak memory.
If you’re wondering why I didn’t pass the fixed matrix to the constructor in the first place: that was my original solution, but then I got stuck keeping the CPU/GPU versions flexible. I think this solution is better at least in that respect…

But for one of the experts scanning the thread: This issue isn’t yet solved. The save_for_backward doesn’t work for me.

Yup - was able to get to the point where we got the same error as you did (including the next step with the great memory leaks).

Will see what the performance is when implementing your current hack and report back. Would be fantastic if the experts could chime in on this one…:slight_smile:

It’s truly a shame that sparse matrix operations present such a hurdle, as I was looking forward to writing this paper with a Pytorch library. However, it seems more and more like I will have to revert to Tensorflow instead.

For me it’s still not fitting into GPU memory, so I guess I’ll need some other approach.
Anyway, I just spent the last day with another gotcha that’s far from obvious: I was calling this sparse matrix multiply in two parts of the graph (it’s all part of a much larger model). Now on toy examples I wasn’t getting proper convergence with this model, even when I used dense tensors! When I then replaced the call of that self-implemented function with a normal torch.mm(), it worked fine.
After lots of swearing and staring, I finally realized in the car what was happening: somehow only one of the two paths are used in the backward phase. After doing some extra things, it finally worked.

All of this is quite irrelevant, except that it shows me that this doesn’t seem to be the intended way to implement the sparse mm autograd. I’m not really finding enough documentation or simple examples, though.

Sorry for the late reply - been trying to wrestle this into submission without much success. The GPU memory is absolutely getting destroyed, and unfortunately there doesn’t seem to be a straightforward fix. Not sure how Pytorch is handling the dynamic graph memory allocation to be honest - I’ve read a good amount on the form about the CUDA caching behavior not being ideal however.

Will test the following flags mentioned in this thread today, and report back:

Hi!

Thank you for your post. I am experiencing exactly same problem in multiplying sparse tensor in autograd function.
I tried to implement this hack. It works but I found that it is actually slower than just using a dense tensor?

Could you please give me some hints about how did you have everything worked in the end?

Thank you very much for your reading

sorry, no solution. In the end I ran some stuff on the cpu, and it didn’t seem to work very well so I stopped trying. I still hope the gurus have a look at this, though…

For what it’s worth: I ran into a similar problem. For some reason the objects stored using self.save_for_backward() were accumulating.

To fix it, I used static methods for forward and backward, as illustrated here.

1 Like