Different loss for cpu and gpu

I am using MultiLabelMaxMargin loss function. The target values to the loss function are the indices followed by -1.
For example, if the correct label indices are 3 and 4, the target tensor looks something like this -

target = torch.tensor([3, 4, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]).to(device)

Do you think this can cause the issue?

Update: The main error was due to the difference between floating point precision in cpu and gpu for float32 type.
I converted all my operations to float64 type including the neural layers. The loss is reducing on cpu and gpu at almost the same rate. The reason I say “almost” is because I am using nn.Dropout on two of my tensors and the RNG for cuda and cpu are different.
Please correct my understanding if I’m wrong.