Model parameters are not updating

Rgb1998 · June 13, 2022, 12:41pm

Hello,

I am trying two implement two NNs joined by one loss function, where the loss function involves performing PCA_inverse on the output of each neural network. The code is as follows:

for epoch in range(num_epochs):
batch_idx = 0

model_diff.train()
model_conv.train()

for (features_diff, _), (features_conv, _), (_, targets_base) in zip(train_loader_diff, train_loader_conv, 
train_loader):

  model_diff.train()
  model_conv.train()

  batch_idx += 1

  features_diff = features_diff.to(device)
  features_conv = features_conv.to(device)
  targets_base = targets_base.to(device)

  # FORWARD AND BACK PROP

  model_diff.requires_grad = True
  model_conv.requires_grad = True

  diff_part = model_diff(features_diff)

  conv_part = model_conv(features_conv)
 
  for (param_diff, param_conv) in zip(model_diff.parameters(), model_conv.parameters()):
    param_diff.retain_grad()
    param_conv.retain_grad()

  with torch.no_grad():

    predictions = pca_y_diff.inverse_transform(model_diff(features_diff).clone().cpu().numpy()) + pca_y_conv.inverse_transform(model_conv(features_conv).clone().cpu().numpy())
    predictions = torch.from_numpy(predictions)
    predictions = predictions.to(device)
    predictions = torch.reshape(predictions, (-1, 1, 256, 256))

  loss = loss_fn(predictions, targets_base)

  optimizer_diff.zero_grad()
  optimizer_conv.zero_grad()

  loss.requires_grad = True
  loss.backward()
  

  # UPDATE MODEL PARAMETERS
  optimizer_diff.step()
  optimizer_conv.step()

The problem is that the parameters do not update at all. If I don’t include ''loss.requires_grad = True" I get the error “[RuntimeError: element 0 does not require grad and does not have a grad_fn]”

Any help is much appreciated

Many thanks

AlphaBetaGamma96 · June 13, 2022, 1:12pm

Rgb1998:

with torch.no_grad():

    predictions = pca_y_diff.inverse_transform(model_diff(features_diff).clone().cpu().numpy()) + pca_y_conv.inverse_transform(model_conv(features_conv).clone().cpu().numpy())
    predictions = torch.from_numpy(predictions)
    predictions = predictions.to(device)
    predictions = torch.reshape(predictions, (-1, 1, 256, 256))

Your predictions are being calculated within a torch.no_grad() context manager and therefore won’t have a grad_fn, can you share the forward and backward part of the code too?

Rgb1998 · June 13, 2022, 2:35pm

here is the forward part:

##########################

MODEL

##########################

class Reshape(nn.Module):
def init(self, *args):
super().init()
self.shape = args

def forward(self, x):
    return x.view(self.shape)

class Trim(nn.Module):
def init(self, *args):
super().init()

def forward(self, x):
    return x[:, :, :256, :256]

class MLP(nn.Module):

def __init__(self, n_components_x, n_components_y):
    super().__init__()
    
    self.ANN = nn.Sequential( #784
            nn.Linear(n_components_x, 100),
            nn.LeakyReLU(0.01),
            nn.Linear(100, 100),
            nn.LeakyReLU(0.01),
            nn.Linear(100, 100),
            nn.LeakyReLU(0.01),
            nn.Linear(100, 100),
            nn.LeakyReLU(0.01),
            nn.Linear(100, n_components_y)
            )

def forward(self, x):
    x = self.ANN(x)
    return x

I tried removing the torch.no_grad() context manager, and now I get the following error:
“Can’t call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead.”

When I then use .detach() in the first predictions line, my parameters once again do not update

AlphaBetaGamma96 · June 13, 2022, 2:50pm

Yes, so if you’re using a function outside of the torch environment (i.e. numpy) pytorch can’t track operation that leave pytorch. When calling .detach() you break your computational graph and hence have no gradient. In this case you have two options,

Find an equivalent function within the torch environment that will automatically propagate gradients.
Define a custom torch.autograd.Function object and manually define both the forward and backward methods respectively in order to propagate gradients.