Error: grad can be implicitly created only for scalar outputs

onimusha702 · February 24, 2019, 2:26pm

So i am trying to train a Variational Auto Encoder, and i have created a custom loss function to train the network, the network throws the error

RuntimeError: grad can be implicitly created only for scalar outputs

heres the Loss function

def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x, reduction='none')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    the_loss = BCE + KLD
    return the_loss

and My training Code

optimizer = torch.optim.Adam(model.parameters(), lr=1e-3,
                             weight_decay=1e-4)

num_epochs = 150

running_loss = 0
steps = 0
print_every = 1
for epoch in range(num_epochs):
    model.train()
    for data in trainloader:
        steps += 1
        img, _ = data
        img = img.cuda()
        
        decoded, mu, logvar = model(img)
        loss = loss_function(decoded, img, mu, logvar)
        
        optimizer.zero_grad()
        loss.backward()  # <------- Error On this Line
        optimizer.step()
        running_loss += loss.item()
        if steps % print_every == 0:
            model.eval()

            with torch.no_grad():
                valid_loss = validation(model, validloader)
            
            model.train()
            
            print("Epoch: {}/{}.. ".format(epoch+1, num_epochs),
                  "Training Loss: {:.4f}.. ".format(running_loss/print_every),
                  "valid Loss: {:.4f}.. ".format(valid_loss/len(validloader)))
            running_loss = 0
    if epoch % 10 == 0:
        save_im(output, 'epoch '+str(epoch))

Image size is 64, 3, 96, 96

bharat0to · February 24, 2019, 3:06pm

Try printing the losses, it should be a tensor with single number

onimusha702 · February 24, 2019, 7:34pm

even if it prints the autograd throws an error, so no point in printing it when i can’t train the model

ptrblck · February 24, 2019, 9:02pm

As @bharat0to said, your loss is most likely a multi-dimensional tensor, which will thus throw this error.
You could add some reduction or pass a gradient with the same shape as loss.

onimusha702 · February 25, 2019, 1:02pm

I tried printing the loss, it was a series of values, so decided to reduce the loss using the reduction='sum' parameter in this function binary_cross_entropy.

It started training though the loss is quite high.

ptrblck · February 25, 2019, 1:56pm

You could try to use reduction='mean' which would lower the loss value or just remove the reduction argument, as mean is the default.

onimusha702 · February 25, 2019, 2:48pm

It helped, Thanks for the suggestion!

Sanjayvarma11 · May 7, 2020, 10:47am

hi ptrblck.I am big fan of your support to this community.I am trying to generate a depth map for a given image.So i used BCELoss() for this where output(to loss function by model) is of size [10,1,250,250] and target(to loss function ground depth) is [10,1,250,250].
Now i am thinking of using reduction="mean’ and backpropagate it.But it is giving me huge values as loss.Plz let me know your opinion.plz tell me which loss function is better in this scenario

ptrblck · May 7, 2020, 7:38pm

If you are using nn.BCELoss, I assume you are using a sigmoid at the end of your model?
I would generally recommend to output raw logits and use nn.BCEWithLogitsLoss as it’ll give you more numerical stability.

Could you check the min and max values of your target, please?
How large is the loss at the moment?

Sanjayvarma11 · May 8, 2020, 6:28am

The saviour ptrblck sir Thank you so much for replying to me.Yeah i looked the min and max value of my target label and since it is a depth image it is in mostly having values 2 and 245.so i divided it by 255 and now the loss is decreased and it is good right now.But i am always having trouble understanding that we are creating an image from an input image then we can just do l1 loss for it but why you are suggesting BCEwithlogitloss.Can you tell me the intuition behind using the BCEwithlogitloss??.Thank you sir

ptrblck · May 8, 2020, 6:30am

nn.BCEWithLogitsLoss was just the better alternative to nn.BCEWithLogitsLoss.
For a depth estimation I would guess that nn.L1Loss or nn.MSELoss might work better, but you should try out different approaches.

Sanjayvarma11 · May 8, 2020, 6:33am

But the nn.MSELoss might not give good results for punishing the small
values.Is it right??

ptrblck · May 8, 2020, 6:35am

That might be correct, but it’s hard to estimate if it would be worse than e.g. L1Loss for depth maps.

Sanjayvarma11 · May 8, 2020, 6:37am

yes sir i will try it also.Thank you so much for answering questions.

mhamdan · October 17, 2020, 5:21am

Hi,
I am trying to compute the gradients of my network output (a batch of a single number) with respect to the model trainable parameters.

I assumed this would do the trick: outputs.backward(), but i am getting the same error as stated in this thread. Although does backward() compute the gradients w.r.t model trainable parameters? Additionally, how can I access the calculated gradients as I need to perform some operations on them?

Please share your thoughts on how I can accomplish the desired functionality?

ptrblck · October 17, 2020, 5:40am

The error is raised if you call .backward() on a tensor, which is not a scalar.
In that case you should either reduce the tensor before (e.g. via tensor.mean()) or pass the gradients to backward (e.g. via tensor.backward(torch.ones_like(tensor))).

You can access the gradients after the backward() call by accessing them directly, e.g.:

print(model.layer.weight.grad)

mhamdan · October 17, 2020, 5:53pm

Thank you for your reply.

So, I do not wish to reduce the vector into scalar as I need gradients for each output. I do not quite follow the other method, which is passing the gradients to backward? Can you elaborate?

Basically, assume my network output is y and network parameters are x --> i want dy/dx for each y in batch y. Then I’d like to aggregate them all together. So, when you say pass the gradients, how do I pass them when backward is actually what computes the gradients?

As for accessing gradients, in TF - you can get all gradients for all network parameters in a single object --> grads = tape.gradients, is not there something similar in pytorch? other than accessing them separately like this : model.layer.weight.grad

ptrblck · October 17, 2020, 8:05pm

The gradient argument in backward can be seen as e.g. dLoss/dLoss, which for a scalar loss value would be 1 and is automatically set for you.
However, if your loss is a tensor in a specific shape, you would have to provide dLoss/dLoss manually, which is shown in my example.

Depending on your use case, you might prefer to use torch.autograd.grad to compute gradients of specific parameters.

Hassan_Imani · December 19, 2020, 1:08pm

How can I do this?
pass a gradient with the same shape as loss .

My loss has 128 element and I dont want to sum or mean of it.

ptrblck · December 20, 2020, 6:52am

You could use loss.backward(torch.ones_like(loss)).