Fail to train nn.Parameter()

lucastononrodrigues · January 29, 2021, 2:37am

Inside my nn.Module class I am creating nn.Parameters() such as:

self.B =self.reset_parameters(nn.Parameter(torch.Tensor(self.projection_dim,10),requires_grad=True)).unsqueeze(0).repeat(num_heads,1,1) #I am using the reset parameters from torch source code https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/linear.py#L81

Apparently, when I run the training loop I get that error.
My goal is to train two matrices A and B (like a feedforward neuralnet) in order to project a given x as xAB, it has to compute AB and then xAB for complexity reasons.

Traceback (most recent call last):
File “main.py”, line 304, in
main(args)
File “main.py”, line 291, in main
training(model,criterion,optimizer,scheduler,train_loader,valid_loader,args)
File “main.py”, line 221, in training
loss.backward()
File “C:\Users\lucas\anaconda3\envs\tcc\lib\site-packages\torch\tensor.py”, line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “C:\Users\lucas\anaconda3\envs\tcc\lib\site-packages\torch\autograd_init_.py”, line 125, in backward
Variable._execution_engine.run_backward(
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

When I use retain_graph I get out of memory from cuda.
The parameters self.A and self.B don’t show up in model.parameters() as well.

I’ve seen people training positional embedding like this and never had to retain_graph, so I wonder what could be wrong. Thanks

ruotianluo · January 29, 2021, 3:14am

Not sure what is the problem extractly, but try to do this:

self.B = nn.Parameter(torch.Tensor(self.projection_dim,10))
(The way you use reset_parameters is wierd, I don’t even know why it works. You should just run self.reset_parameters at the end of init)

in the forward use,
self.B.unsqueeze(0).repeat(num_heads,1,1)

lucastononrodrigues · January 29, 2021, 3:27am

The way I reset parameters is the same from the nn.Linear

def reset_parameters(self,W,B=None):
W=torch.nn.init.kaiming_uniform_(W, a=math.sqrt(3))
if B is not None:
fan_in, _ = torch.nn.init.calculate_fan_in_and_fan_out(W)
bound = 1 / math.sqrt(fan_in)
B=torch.nn.init.uniform(B, -bound, bound)
return W,B
return W

I will try without the reset_parameters and as soon as my gpus are free, thank you.

ruotianluo · January 30, 2021, 3:39am

What about not taking and returning anything, do exact what nn.Linear does.

lucastononrodrigues · January 30, 2021, 11:15pm

In a certain way I am trying to use a projection just like nn.Linear and then projecting again with another matrix, but I would like to control the order of the computing, for time complexity reasons. x (W_1 W_2) instead of (x W_1) W_2

ruotianluo · January 31, 2021, 2:10am

I understand that. But reset_parameter should just initialize the parameter in place, instead of taking some input and return some output.

When you define a variable in nn.Module, nn.Module will do something secretly, so the best way to define a parameter is:
self.xxx = nn.Parameter(some tensor)
instead of
self.xxx = nn.Parameter(soem tensor).unsqueeze(xxx).repeat(xxxx)