If I’m not missing something, they should be the same. However, I tried the follow snippet, but they are not equal.
# -*- coding: utf-8 -*-
import numpy as np
import torch.nn.functional as F
from torch import nn
from torch.autograd import Variable
def __init__(self, n_features, n_hiddens, n_classes):
self.gru = torch.nn.GRU(n_features, n_hiddens)
self.linear = torch.nn.Linear(n_hiddens, n_classes)
def forward(self, x, flag=True):
o, h = self.gru(x)
o = self.linear(o)
o = F.log_softmax(o)
n_steps = 10
n_classes = 100
mb_size = 32
n_features = 50
n_hiddens = 60
net = Net(n_features, n_hiddens, n_classes)
loss1 = torch.nn.NLLLoss(size_average=False)
loss2 = torch.nn.CrossEntropyLoss(size_average=False)
x = Variable(torch.rand(n_steps, mb_size, n_features))
y = Variable(
torch.LongTensor(np.random.randint(0, n_classes, (n_steps, mb_size))))
logits1 = net(x, flag=True).view(-1, n_classes)
logits2 = net(x, flag=False).view(-1, n_classes)
loss_val1 = loss1(logits1, y.view(-1))
loss_val2 = loss2(logits2, y.view(-1))
They are the same (see the implementation). I think the reason why it isn’t working out for you because log_softmax gives different results depending on shape. The shape of x when passed into log_softmax in forward is different from the shape of logit2.
Thank you so much. I can’t believe that iPython gives me
I don’t even know about there’s a
dim param in the
The dim parameter is new and will be in the next release. The docs are fixed too. Here’s what it says in
master, if you build from source:
In : ?torch.nn.functional.log_softmax
Signature: torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3)
Applies a softmax followed by a logarithm.
While mathematically equivalent to log(softmax(x)), doing these two
operations separately is slower, and numerically unstable. This function
uses an alternative formulation to compute the output and gradient correctly.
See :class:`~torch.nn.LogSoftmax` for more details.
input (Variable): input
dim (int): A dimension along which log_softmax will be computed.
You can always use docs.pytorch.org
Thank you for you information.
Yes, according to the official website
they are equivalent
Could you elaborate on “log_softmax gives different results depending on shape”? I’ve printed the shapes and they look the same.