As proposed here it should be a possibility to use softmax because the hard indexing is not differentiable