Negative log likelyhood loss

bluesky314 · November 19, 2018, 10:42am

Hey I have a very simple NN with only one layer. I was going through someones tutorial and they did not take the log in the negative log likely hood loss. This is on the MNIST flattened dataset. When I take the log and train, my model does not learning anything while without the log it achieves 90% accuracy. How can log be messing up the learning, shouldn’t it be the same?

This is his and my loss:

def nll(input, target): return -input[range(target.shape[0]), target].mean()

def nll2(input,target):  #my
    x= -input[:,target].mean()
    return torch.log(x)

def log_softmax(x): return x - x.exp().sum(-1).log().unsqueeze(-1) # ???

def model(xb):      return log_softmax(torch.mm(xb,weights) + bias)

with gzip.open('data/mnist/mnist.pkl.gz', 'rb') as f:
    ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f)


bs=64                  # batch size

xb = x_train[0:bs]     # a mini-batch from x
preds = model(xb)      # predictions
preds[0], preds.shape


weights = torch.randn(784,10)/math.sqrt(784) # dont want 784 to be in grad
weights.requires_grad_()
bias = torch.zeros(10, requires_grad=True)

def acc(input,target):
    indices=torch.argmax(input,1)
    
    return (indices==target).sum().float()/len(indices)

Training :

lr=0.015
for i in range(0,5000):
    r=np.random.randint(0,50000)
    ins=x_train[r:r+64]
    tars=y_train[r:r+64]

    outs=model(ins)

    loss=nll(outs,tars) # ***** changing nll to nll2 does not make the model learn
    
    if i%10==0:
        print('loss: ',loss)
        print('accuracy: ',acc(outs,tars))
    loss.backward()
    
    with torch.no_grad():
            weights -= weights.grad * lr
            bias -= bias.grad * lr
            weights.grad.zero_()
            bias.grad.zero_()

ptrblck · November 19, 2018, 12:36pm

Your model already returns the log_softmax. So using another log on this tensor is not a good idea.
You could just for the sake of debugging use softmax and then call your nll2 method to see if the model is learning.
However, calling log on softmax might be numerically unstable and is generally not recommended.