Getting RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

bing · August 8, 2022, 10:01pm

Hi,
I am doing binary image classification and using BCEWithLogitLoss.
Initally, I was getting RuntimeError: result type Float can’t be cast to the desired output type Long
So after searching, I converted the pred and target to float but now I am getting RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I really don’t have an idea where I am doing wrong -

My training loop looks like the below -

model.train()
for batch_idx, (data, target) in enumerate(loader['train']):
            # move to GPU
      if torch.cuda.is_available():
          data, target = data.to('cuda', non_blocking=True), target.to('cuda', non_blocking = True) # noqa
      optimizer.zero_grad()
      output = model(data)
      pred = torch.argmax(output, dim=1)
      loss = criterion(pred.float(), target.float()). # Conversion of pred and target to float
      loss.backward()
      optimizer.step()
      train_loss += ((1 / (batch_idx + 1)) * ((loss.data) - train_loss))

My transfer learning & loss function loading code snippet looks like below -

criterion_transfer = nn.BCEWithLogitsLoss()
model = timm.create_model('convnext_tiny_in22k', pretrained=True,num_classes=2) # noqa
optimizer = torch.optim.SGD(model_transfer.parameters(), lr=config.lr)
optimizer.zero_grad()
model_transfer.to(device)
criterion_transfer.to(device)

AlphaBetaGamma96 · August 8, 2022, 10:15pm

Hi @bing, you can’t differentiate torch.argmax with respect to output (as torch.argmax has no grad_fn) so you need to find another way to convert your output tensor to a prediction with an operation that has a grad_fn. A minimal example below to show that torch.argmax has no grad_fn.

import torch
x=torch.randn(10,4,requires_grad=True)
output = torch.argmax(x, dim=1)
print(output.grad_fn) #returns None

You might just be able to remove the torch.argmax call as your Loss seems to expect the raw logits and replace the loss calculation as,

      loss = criterion(output.float(), target.float()). # Conversion of pred and target to float

More info in this post here (about logits with a different loss function you might find useful)

bing · August 8, 2022, 10:27pm

I did tried calculating the loss function as you suggested earlier but I was getting
ValueError: Target size (torch.Size([1])) must be the same as input size (torch.Size([1, 2]))
So I pivoted to calculating the argmax
Below is the shape of my output and target

Shape of output: torch.Size([1, 2])
Shape of target: torch.Size([1])

AlphaBetaGamma96 · August 8, 2022, 10:35pm

So, I had a quick read through the docs for BCEWithLogitsLoss (docs here). These shapes represent the [batch, num_classes] respectively (and obviously should have the same size).

So check your target Tensor has the right shape
Or perhaps you need to reduce output to match the shape of target (as that’s what torch.argmax was effectively doing)

bing · August 8, 2022, 10:39pm

I am initially trying to run the network on 1 sample only, so the target is supposed to be of shape 1, I tried unsqueezing also but it didn’t work out.
Yes, the output is supposed to be a single value but I really don’t know if not to use argmax then how to do it.
Below are my Target and output values -

Target -  tensor([0], device='cuda:0')
Output -  tensor([[0.3863, 0.1197]], device='cuda:0', grad_fn=<GatherBackward>)

AlphaBetaGamma96 · August 8, 2022, 10:49pm

If your target is a scalar for a single sample, it should have a shape of [1,1] because the shape is defined as [num_samples, size_of_one_sample] which corresponds to [1,1].

If you’re trying to get the position of the max value of output, surely you should be using torch.max instead of torch.argmax (as torch.max has a grad_fn)?

So,

pred = torch.max(output, dim=1, keepdim=True)[0] #need the [0] to return values, [1] is indices

and make sure to have keepdim=True (so your shape is correct!), and this approach has a grad_fn,

import torch
x=torch.randn(10,4,requires_grad=True)
output = torch.max(x, dim=1, keepdim=True)[0]
print(output.grad_fn) #prints MaxBackward0

bing · August 8, 2022, 11:10pm

Thanks, It worked now.