How to change manually the size of a layer without breaking autograd?

Zen · December 21, 2018, 10:46am

I want to change the size of the tensors weight and bias in a torch.nn.Linear layer. Even if I set weight.grad = None and bias.grad = None, the backward pass returns an error. A possible way to solve the problem would be an entire re-initialization of the computational graph, but I don’t know how to do it. However, here is a minimal code:

import torch

l = torch.nn.Linear(5, 4)
x = torch.randn(1, 5)
y = l(x).pow(2).mean()
y.backward()

l.weight.data = torch.randn(3, 5)
l.weight.grad = None
l.bias.data = torch.randn(3)
l.bias.grad = None
z = l(x).pow(2).mean()
print(z)
z.backward()

But I get the error:

RuntimeError: Function ExpandBackward returned an invalid gradient at index 0 - expected shape [4] but got [3]

Somehow, it seems that autograd keeps intermediary computations, and use it to recompute the gradients, hence a necessary re-initialization of the whole computational graph.
Since many objects in my code contain references to the layers I use, I don’t want to re-create a layer each time I am deleting a row or a column in it.

tom · December 21, 2018, 8:01pm

At the risk of being blunt: Using .data has been a bad idea for more than half a year.

When you try to assign .weight and .bias, PyTorch will tell you exactly what to do, i.e. have a nn.Parameter to assign (I know, because I tried with a tensor before that):

l = torch.nn.Linear(5, 4)
x = torch.randn(1, 5)
y = l(x).pow(2).mean()
y.backward()

l.weight = torch.nn.Parameter(torch.randn(3, 5))
l.bias = torch.nn.Parameter(torch.randn(3))
z = l(x).pow(2).mean()
print(z)
z.backward()

Best regard and stay clear of .data

Thomas

Zen · December 22, 2018, 9:09am

Ok. This means that I cannot store the reference to a Parameter object if its size is likely to change: a change of size means reassignment of the Parameter object (and not only the .data).
For information, I reused old pieces of code that worked in the older versions of PyTorch.

Ge0rges · March 1, 2020, 5:17pm

How would you do this in 0.3.0 before using .data was a bad idea? I’m encountering a similar challenge here.