Same softmax values no matter what is the input problem

If this example is run as-is, i think it should work. Are you using a slightly different variant in practice? For example, after constructing the optimizer are you moving your model to .cuda() (rather than the corrrect ordering of model.cuda(); optimizer = ...)?