I am doing reinforcement learning. Started to write a technical specification but I am stuck with converting softmax output to target suitable for MSELoss(). So, softmax will give me a probability but I want to feed it to MSELoss in shape of [batch_size, *].
How can I do it?
Found an answer: How should I implement cross-entropy loss with continuous target outputs? Sorry, but if you would like to add I will be happy to listen.