I assume you are not summing up / reducing the loss-value in any other (manual) way? You should be able to do
loss = criterion(pred, label)
loss.backward(torch.ones_like(loss))
Per default a scalar one is propagated back to calculate the gradients but for an arbitrary loss-tensor-size you need to manually specify the tensor which should be propagated back.