Your approach seems to work and I get valid gradients:
output = torch.randn(1, 1, requires_grad=True)
target = torch.randn(1, 1)
loss = {i: nn.MSELoss() for i in range(10)}
err = torch.sum(torch.stack([l(output, target) for l in loss.values()]))
err.backward()
print(output.grad)
> tensor([[-6.1927]])
This issue seems to be specific to the JIT, but let me know, if you see any issues using the eager approach (e.g. None gradients where you would expect valid gradients).