Requires gradient on only part of a matrix?

pierre_bizeul · August 1, 2017, 10:03pm

Hello everyone,
I have a little question, that I think is easy, but I did not find the answer.
I train some neural network for classification
I have a matrix A of shape (N,N), let’s say that I use for operations in the network.
I want it to be learned, except that I want only some fixed indices to be trainable, and the others to remain zero.

So, I have a support included in [0,N]x[0,N] , which is a list of indices where the coefficents of A should be learnt, and the rest is zero :

A[support] = trainable
A[not support] = zero

So how should I code this ?
I was considering creating some scalar parameters, and then assigning them into A, which would be a Variable
Is this the way to go ?
Thanks

EDIT : it juste came to my mind that you can just initiate : A[support]=random, A[not support] = 0, because then the gradients with respect to the zero indices will always be zero. But that’s not a clean way to do it

pierre_bizeul · August 4, 2017, 7:43pm

Please, anyone to help me here ?

matthew_zeng · August 5, 2017, 2:40am

The solution is very straightforward. Just fill zeros into the desired slots of the gradient before updating your weights.

a = Variable(torch.randn(3,4), requires_grad=True)
b = torch.mean(a ** 2)

b.backward()

# fill zeros into the first row of grad
b.grad.data[0,:].fill_(0)

# the first row does not change due to zero grad
optimizer.step()

ecolss · January 26, 2019, 1:49pm

This doesn’t seem to be a good idea, because you still have to operate on the whole weight tensor in optimizer.step().
Is there any way we could reduce the operations?

matthew_zeng · January 26, 2019, 2:30pm

Unless the variable has no dependency on it at all (e.g., the variable is input like word embedding), we will have to compute the grad as it is required by the BP algorithm.