How to train a parameter initialized outside of nn.Module

thanrl · February 24, 2024, 6:26am

I have a somewhat global parameter that I’d like to train.

global_param = torch.tensor(...).requires_grad_()

This global_param is used to initialize parts of a network

class Model(nn.Module):
    def __init__(self, global_param):
        super().__init__()
        self.param1 = global_param*2
        self.param2 = global_param/2
        # other nn.Parameters
    def forward(self, x):
        # the forward pass uses self.param1, self.param2 and other nn.Parameters

If I write a normal training loop, I get the “Trying to backward through the graph a second time” error.

net = Model(global_param)
optimizer = torch.optim.Adam(list(net.parameters())+[global_param])
for sample in data:
    pred = net(sample)
    loss = loss_fn(pred, target)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Understandably, because part of the computation graph self.param1 = global_param*2; self.param2 = global_param/2 is created before the forward pass. This error is gone if I do loss.backward(retain_graph=True), but it results in this new error

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor []] is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I am having a hard time debugging this in-place error, probably related to retaining the computation graph. So Is there a better way to write this training loop without setting retain_graph=True? Thank you!