Error: grad can be implicitly created only for scalar outputs

So i am trying to train a Variational Auto Encoder, and i have created a custom loss function to train the network, the network throws the error

RuntimeError: grad can be implicitly created only for scalar outputs

heres the Loss function

def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x, reduction='none')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    the_loss = BCE + KLD
    return the_loss

and My training Code

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3,
                             weight_decay=1e-4)

num_epochs = 150

running_loss = 0
steps = 0
print_every = 1
for epoch in range(num_epochs):
    model.train()
    for data in trainloader:
        steps += 1
        img, _ = data
        img = img.cuda()
        
        decoded, mu, logvar = model(img)
        loss = loss_function(decoded, img, mu, logvar)
        
        optimizer.zero_grad()
        loss.backward()  # <------- Error On this Line
        optimizer.step()
        running_loss += loss.item()
        if steps % print_every == 0:
            model.eval()

            with torch.no_grad():
                valid_loss = validation(model, validloader)
            
            model.train()
            
            print("Epoch: {}/{}.. ".format(epoch+1, num_epochs),
                  "Training Loss: {:.4f}.. ".format(running_loss/print_every),
                  "valid Loss: {:.4f}.. ".format(valid_loss/len(validloader)))
            running_loss = 0
    if epoch % 10 == 0:
        save_im(output, 'epoch '+str(epoch))

Image size is 64, 3, 96, 96

Try printing the losses, it should be a tensor with single number

even if it prints the autograd throws an error, so no point in printing it when i can’t train the model

As @bharat0to said, your loss is most likely a multi-dimensional tensor, which will thus throw this error.
You could add some reduction or pass a gradient with the same shape as loss.

2 Likes

I tried printing the loss, it was a series of values, so decided to reduce the loss using the reduction='sum' parameter in this function binary_cross_entropy.

It started training though the loss is quite high.

You could try to use reduction='mean' which would lower the loss value or just remove the reduction argument, as mean is the default.

1 Like

It helped, Thanks for the suggestion!

hi ptrblck.I am big fan of your support to this community.I am trying to generate a depth map for a given image.So i used BCELoss() for this where output(to loss function by model) is of size [10,1,250,250] and target(to loss function ground depth) is [10,1,250,250].
Now i am thinking of using reduction="mean’ and backpropagate it.But it is giving me huge values as loss.Plz let me know your opinion.plz tell me which loss function is better in this scenario

If you are using nn.BCELoss, I assume you are using a sigmoid at the end of your model?
I would generally recommend to output raw logits and use nn.BCEWithLogitsLoss as it’ll give you more numerical stability.

Could you check the min and max values of your target, please?
How large is the loss at the moment?

The saviour ptrblck sir Thank you so much for replying to me.Yeah i looked the min and max value of my target label and since it is a depth image it is in mostly having values 2 and 245.so i divided it by 255 and now the loss is decreased and it is good right now.But i am always having trouble understanding that we are creating an image from an input image then we can just do l1 loss for it but why you are suggesting BCEwithlogitloss.Can you tell me the intuition behind using the BCEwithlogitloss??.Thank you sir

nn.BCEWithLogitsLoss was just the better alternative to nn.BCEWithLogitsLoss.
For a depth estimation I would guess that nn.L1Loss or nn.MSELoss might work better, but you should try out different approaches. :wink:

But the nn.MSELoss might not give good results for punishing the small
values.Is it right??

That might be correct, but it’s hard to estimate if it would be worse than e.g. L1Loss for depth maps.

yes sir i will try it also.Thank you so much for answering questions.

Hi,
I am trying to compute the gradients of my network output (a batch of a single number) with respect to the model trainable parameters.

I assumed this would do the trick: outputs.backward(), but i am getting the same error as stated in this thread. Although does backward() compute the gradients w.r.t model trainable parameters? Additionally, how can I access the calculated gradients as I need to perform some operations on them?

Please share your thoughts on how I can accomplish the desired functionality?

The error is raised if you call .backward() on a tensor, which is not a scalar.
In that case you should either reduce the tensor before (e.g. via tensor.mean()) or pass the gradients to backward (e.g. via tensor.backward(torch.ones_like(tensor))).

You can access the gradients after the backward() call by accessing them directly, e.g.:

print(model.layer.weight.grad)

Thank you for your reply.

So, I do not wish to reduce the vector into scalar as I need gradients for each output. I do not quite follow the other method, which is passing the gradients to backward? Can you elaborate?

Basically, assume my network output is y and network parameters are x --> i want dy/dx for each y in batch y. Then I’d like to aggregate them all together. So, when you say pass the gradients, how do I pass them when backward is actually what computes the gradients?

As for accessing gradients, in TF - you can get all gradients for all network parameters in a single object --> grads = tape.gradients, is not there something similar in pytorch? other than accessing them separately like this : model.layer.weight.grad

The gradient argument in backward can be seen as e.g. dLoss/dLoss, which for a scalar loss value would be 1 and is automatically set for you.
However, if your loss is a tensor in a specific shape, you would have to provide dLoss/dLoss manually, which is shown in my example.

Depending on your use case, you might prefer to use torch.autograd.grad to compute gradients of specific parameters.