Edited:
Now I find what’s going on. It’s because there is one unsqueeze_()
left somewhere which is inplace. For some reason unsqueeze_()
won’t affect torch.sum()
but it does affect nn.CrossEntropyLoss()
…
tl;dr
original post:
I didn’t use inplace ops any where obviously (except things like unsqueeze_()
, which I tried to remove but had no effect.)
To understand what’s going on, I randomly chose some tensors during the calculation and here is a ridiculous thing:
I chose a variable, called x, and my code has:
loss = nn.CrossEntropyLoss()(x, torch.LongTensor([1]))
I ran loss.backward()
. It fails and claims gradient computation has been modified by an inplace operation.
I changed it to
loss = torch.sum(x)
now loss.backward()
works fine.
How is this even possible? if somehow the grad of x contains some hidden problem, shouldn’t the second line break as well?
So to resolve this, I manually wrote the nn.CrossEntropyLoss()… (It’s not hard. Per https://pytorch.org/docs/master/nn.html?highlight=nllloss#crossentropyloss) it’s just a one line code. Now it’s not complaining.
Any suggestions / comments? Thanks quite a lot.