I tried making some similar inputs and couldn’t reproduce, so we might need more context, or maybe this is a bug that was fixed in a more recent version of PyTorch:
In [31]: output
Out[31]: tensor(1.2804, device='cuda:0', grad_fn=<NllLossBackward>)
In [32]: penalty
Out[32]: tensor(3.3333, device='cuda:0', grad_fn=<DivBackward0>)
In [33]: output += penalty
In [34]: output.backward()
In [35]: output
Out[35]: tensor(4.6137, device='cuda:0', grad_fn=<AddBackward0>)