I faced a problem about reduction way of nll_loss, which give me extremely different result by same seed in GCN.
I am using GCN to do binary text classification. First, I use class weight in the nll_loss as belowed. It gives me good result. As I used same seed, everytime I rerun the GCN, I can get the same result. I can check the log each iteration of loss is same.
np.random.seed(args.seed)
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)
x_class = y.sum(dim=0).float()
z_weight = (1.0/x_class)*x_class.sum(0)/2.0
y = y.max(1)[1].type_as(labels)
return F.nll_loss(preds, y, z_weight)
But when I changed the nll_loss to reduction “none”, although the loss return is also same as above at the beginning, which value is 0.7351 at the first time, it will show different result after multi iteration at the end.
Are both approaches deterministic in isolation? I.e. are you getting the same results if you rerun the reduction='none' code?
If so, then I think the difference between both approaches could come from a different order of the reductions and thus accumulated small absolute errors due to the limited floating point precision.
Thanks your reply. @ptrblck
Yes, the returned loss of two reduction way are same, which is 0.7351 at the first time.
When I rerun the reduction with none, it always show 0.7351 at the first time.
I confirmed it by using ipdb to check.
But after several iteration, the loss began to differ.
So, How to fix the calculation order of reduction.
The reason why I wanna convert to reduction of none, is that I wanna combine sample weight to existed class weight’s loss.
Yes, I have modified my first reply. When I rerun the code with reduction of none in the same seed, it output the same result with loss from first iteration to the last iteration.
In fact, when I guessed from the result of reduciton of none, it seems that it doesn’t fully cooperate the class weight info.
try doing this segment on cpu, the issue is that some operators used during backward are non-deterministic, so gradients differ in re-runs; in particular, indexing/gathering with many-to-one element correspondence
Thanks for your reply. @googlebot
In fact I run these codes in CPU already, but it shows the result above.
Besides that, I rerun the reduction of none serveral times with same seed, it prove the same result and same loss.
From my observation for the result returned by reduction of none, it seems that it doesn’t cooperate the class weight info.
So I wander is it a bug or other problem?
But I still think it’s not related to float precision, cause I observe from the result of reduction of none, it seems that it doesn’t cooperate the class weight info.
I still don’t know how to calculate the loss under the none correctly…
Can I have the method to overcome such problem?
Cause in my situation, the reduction of mean return a good result, however, the reduction of none return a totally bad result.
If I wanna use the reduction of none to cooperate the sample weight, I must overcome the problem above.
Appreciate for your help.