Hi,
I was just experimenting with pytorch. I implemented a cross-entropy loss function and softmax function as below
def xent(z,y):
y = torch.Tensor(to_one_hot(y,3)) #to_one_hot converts a numpy 1D array to one hot encoded 2D array
y_hat = pt_softmax(z)
loss = -y*torch.log(y_hat)
loss = loss.mean()
return loss
def pt_softmax(x):
exps = torch.exp(x - torch.max(x,dim=1)[0].unsqueeze(1))
return exps / torch.sum(exps,dim=1).unsqueeze(1)
And I was comparing this loss with nn.CrossEntropyLoss
and found that nn.CrossEntropyLoss
converges faster on wine dataset on UCI repo. Also, the weights and gradients obtained after each epoch was different for both the losses. I am using batch gradient descent and not getting any nan values.
Can anyone please let me know why this is happening? Is it because of unstable implementation of xent
or due to some other reason?