RuntimeError: grad can be implicitly created only for scalar outputs in one GPU

E4CE · September 14, 2022, 6:51pm

Hello, i have encountered this error.

I am trying to train my own dataset and encountered this error.

Ignoring class  0  in IoU evaluation
[IOU EVAL] IGNORE:  tensor([0])
[IOU EVAL] INCLUDE:  tensor([1, 2])
Lr: 3.106e-05 | Update: 2.258e-01 mean,4.181e-01 std | Epoch: [0][0/322] | Time 3.170 (3.170) | Data 0.154 (0.154) | Loss 1.9250 (1.9250) | acc 0.533 (0.533) | IoU 0.363 (0.363) | [1 day, 20:35:54]
Traceback (most recent call last):
  File "/content/LiDAR-MOS/mos_SalsaNext/train/tasks/semantic/train.py", line 178, in <module>
    trainer.train()
  File "../../tasks/semantic/modules/trainer.py", line 274, in train
    show_scans=self.ARCH["train"]["show_scans"])
  File "../../tasks/semantic/modules/trainer.py", line 391, in train_epoch
    loss_m.backward()
  File "/usr/local/lib/python3.7/dist-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 166, in backward
    grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)
  File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 67, in _make_grads
    raise RuntimeError("grad can be implicitly created only for scalar outputs")
RuntimeError: grad can be implicitly created only for scalar outputs

  optimizer.zero_grad()
            if self.n_gpus > 1:
                idx = torch.ones(self.n_gpus).cuda()
                loss_m.backward(idx)
            else:
                loss_m.backward() #here i got the error
            optimizer.step()

I have looked the error in google, and it usually happens when you use two or more GPUs. However, I am using only one GPU and got this error. Also, when I try to train it with my dataset 1 which has around 400 frames, it works. However, when I try to train it with dataset 2 which has around 2400 this throws an error.
Could you please help me to solve this error?

ptrblck · September 14, 2022, 7:51pm

The error is raised if the loss tensor contains more than a single element. You would have to either pass the gradient directly to the backward call or reduce the tensor first e.g. via loss_m.mean().backward().

In any case, you should also try to figure out why the shape of the loss_m tensor changes as it seems to be unexpected.