Grad can be implicitly created only for scalar outputs

ptrblck · March 30, 2021, 5:15am

The error is raised, if you try to call .backward() on a non-scalar tensor (i.e. a tensor with more than a single element). If that’s the desired use case, you would have to provide the gradients manually e.g. via .backward(gradient=torch.ones_like(loss)) or reduce the loss before (e.g. via loss.mean().backward()).
@albanD explains it in this post with more detail.

PS: you can post code snippets by wrapping them into three backticks ```, which makes debugging easier