The input given through a forward call is expected to contain
log-probabilities of each class.

If you calculate log (y_hat) yourself, you will get the expected
result. Here is a pytorch 0.3.0 script and its output

import torch
torch.__version__
loss = torch.nn.NLLLoss()
y_hat = torch.autograd.Variable (torch.FloatTensor ([[0.7,0.1,0.2]]))
y = torch.autograd.Variable (torch.LongTensor ([0]))
loss (y_hat,y)
y_hat_log = y_hat.log()
loss (y_hat_log, y)

>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>> loss = torch.nn.NLLLoss()
>>> y_hat = torch.autograd.Variable (torch.FloatTensor ([[0.7,0.1,0.2]]))
>>> y = torch.autograd.Variable (torch.LongTensor ([0]))
>>> loss (y_hat,y)
Variable containing:
-0.7000
[torch.FloatTensor of size 1]
>>> y_hat_log = y_hat.log()
>>> loss (y_hat_log, y)
Variable containing:
0.3567
[torch.FloatTensor of size 1]

(This is done to avoid the separate calculation of softmax() – to get
the probabilities – and log() – to then get the log-probabilities. Doing
these separately can be numerically unstable. Instead, NLLLoss is
designed so that you can use the numerically-stable log_softmax()
to calculate its input.)