I am trying to implement MNIST with one hot encoding but cross entropy won’t work. What loss function should I be using?

what if you take argmax of the one hot vector and pass it to cross entropy loss?

But wouldn’t the argmax of every one-hot-encoded vector be 1?

No. argmax gives the index of max element

Taking the argmax worked. Thanks

But I was looking at the cifar-10 tutorial of Pytorch and it had an output layer of width 10 but the target was a scalar only. How does that work?

the cross entropy loss function internally takes care of this.

Can you please explain how that happens or point to any resource?