I am using
MultiLabelMaxMargin loss function. The target values to the loss function are the indices followed by -1.
For example, if the correct label indices are
3 and 4, the target tensor looks something like this -
target = torch.tensor([3, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]).to(device)
Do you think this can cause the issue?
Update: The main error was due to the difference between floating point precision in cpu and gpu for
I converted all my operations to
float64 type including the neural layers. The loss is reducing on cpu and gpu at almost the same rate. The reason I say “almost” is because I am using
nn.Dropout on two of my tensors and the RNG for cuda and cpu are different.
Please correct my understanding if I’m wrong.