I’m trying to calculate a gradient w.r.t a sparse matrix. It seems like pytorch’s autograd doesn’t support getting the gradient for sparse matrix so I want to calculate it manually if it’s possible.
the forward function is softmax(A*AXW). A is a sparse matrix and I want to calculate the gradient w.r.t A
I think it’d be easier to read if you wrote it as a PyTorch output, I cannot quite make out what A, X, and W are and what is the exact expression you have in the softmax.
In general the result of matrix multiplication Y = A @ B and some scalar-valued function f(Y) following it has gradients df/dA = (df/dY)@ B.t() , df/dB = A.t() @ (df/dY). Around PyTorch df/dY is sometimes called
grad_out for the matrix multiplication and you see backpropagation at work…
Sorry I think I was being unclear
the actual code goes like this
def __init__(self, nfeat, nhid1, nhid2, nhid3, nclass, dropout, with_relu=True):
self.gc1 = GraphConvolution(nfeat, nhid1)
self.gc2 = GraphConvolution(nhid1, nclass)
self.dropout = dropout
self.with_relu = with_relu
def forward(self, x, adj):
axw , w0 = self.gc1(x, adj)
aaxw0w1, w1 = self.gc2(x, adj)
return F.log_softmax(aaxw0w1, dim=1)
Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
def __init__(self, in_features, out_features, bias=False):
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
self.bias = Parameter(torch.FloatTensor(out_features))
stdv = 1. / math.sqrt(self.weight.size(1))
if self.bias is not None:
def forward(self, input, adj):
support = torch.mm(input, self.weight)
output = torch.spmm(adj, support)
return output, self.weight
return self.__class__.__name__ + ' (' \
+ str(self.in_features) + ' -> ' \
+ str(self.out_features) + ')'
In my neural network X would be the input and adj is used in the hidden layer and output layer and I want to get the gradient w.r.t the adj.
Oh, it seems that torch.spmm is confusing.
With torch.sparse.mm, you can have gradients all right.
a = torch.randn(2, 3).to_sparse().requires_grad_(True)
b = torch.randn(2, 3, requires_grad=True)
Yes, this works!
However when I tried it with a large sparse matrix, it is taking so much time to get the gradient. Is there a way to go around this problem?
What’s large here?
The problem is likely (and ha, now the formula is useful after all) that the to computation df/dA = (df/dY)@ B.t() is in all dense matrices and you don’t, in general, have the sparseness same pattern in the gradient. What PyTorch does under the hood is to compute the dense derivative (large) and then apply the sparseness pattern.
You could try to implement your own sparse_mm with backwards using scatter_add from the 3rd party PyTorch scatter package or somesuch.
Note that the sparse derivative “enforces” the sparseness pattern (this is often desired even though the there may be gradients outside the mask).
By large, I ment a sparse matrix with shape(100k, 100k).
I’ll try to implement my own fuction as you explained.
Have you implemented your own function for this?