MSELoss and torch.max

As proposed here it should be a possibility to use softmax because the hard indexing is not differentiable