Negative log likelyhood loss

Hey I have a very simple NN with only one layer. I was going through someones tutorial and they did not take the log in the negative log likely hood loss. This is on the MNIST flattened dataset. When I take the log and train, my model does not learning anything while without the log it achieves 90% accuracy. How can log be messing up the learning, shouldn’t it be the same?

This is his and my loss:

def nll(input, target): return -input[range(target.shape[0]), target].mean()

def nll2(input,target):  #my
    x= -input[:,target].mean()
    return torch.log(x)

def log_softmax(x): return x - x.exp().sum(-1).log().unsqueeze(-1) # ???

def model(xb):      return log_softmax(,weights) + bias)

with'data/mnist/mnist.pkl.gz', 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f)

bs=64                  # batch size

xb = x_train[0:bs]     # a mini-batch from x
preds = model(xb)      # predictions
preds[0], preds.shape

weights = torch.randn(784,10)/math.sqrt(784) # dont want 784 to be in grad
bias = torch.zeros(10, requires_grad=True)

def acc(input,target):
    return (indices==target).sum().float()/len(indices)

Training :

for i in range(0,5000):


    loss=nll(outs,tars) # ***** changing nll to nll2 does not make the model learn
    if i%10==0:
        print('loss: ',loss)
        print('accuracy: ',acc(outs,tars))
    with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr

Your model already returns the log_softmax. So using another log on this tensor is not a good idea.
You could just for the sake of debugging use softmax and then call your nll2 method to see if the model is learning.
However, calling log on softmax might be numerically unstable and is generally not recommended.