Different Losses on 2 different machines

(Sebastian Raschka) #21

hm, I think I remember seeing RTX2080 specific issues regarding elsewhere (not sure if if was in this discussion forum or elsewhere) and just saw this post from november: https://www.pcbuildersclub.com/en/2018/11/broken-gpus-nvidia-apparently-no-longer-sells-the-rtx-2080-ti/

In any case, it may be that your’s may unfortunately be affected. I’d probably also go with the replacement thing now as this appears not to be an unusual case

(Stepan Ulyanin) #22

Yeah, I read they had problems with dying cores in the very beginning, I am going to post on NVIDIA dev forums tomorrow in search of some kind of utility to check the CUDA cores.

(Sebastian Raschka) #23

just FYI, I just see that there were similar issues here as well

(Stepan Ulyanin) #24

@ptrblck, @rasbt I have a follow up for the issue:

I am running the model on non-faulty GPU and I have noticed that kaiming_normal_ initialization makes my losses go up substantially. However, I had the same behavior of a model with the kaiming_normal_ initialization on my home rig with GTX 1080Ti. Is there a reason why Kaiming normal initialization substantially increases the training loss?

Without kaiming_normal_ initialization:

With kaiming_normal_ initialization:

(Sebastian Raschka) #25

Is there a reason why Kaiming normal initialization substantially increases the training loss?

No, there shouldn’t be a specific reason for Kaiming/He increasing the training loss – personally, I only notices minor differences. Actually, the default Kaiming He (normal) initialization scheme and PyTorch’s default initialization schemes look relatively similar.

Kaiming (normal) uses:


with a=0 by default.

The PyTorch default uses:


when i see that correctly from

def reset_parameters(self):
    stdv = 1. / math.sqrt(self.weight.size(1))
    self.weight.data.uniform_(-stdv, stdv)
    if self.bias is not None:
        self.bias.data.uniform_(-stdv, stdv)

But since the sqrt is in the denominator, it can be much larger. So you probably want to lower your learning rate when using kaiming_normal_. Would be curious to hear what happens if you do that. Maybe choose the learning rate as follows:

learning_rate_before * default_std(fan_in) = new_learning_rate * kaiming(fan_in)

=> earning_rate_before * default_std(fan_in) / kaiming(fan_in) = new_learning_rate

Would be curious to hear what you find…

(Stepan Ulyanin) #26

Thank you for your suggestion, I will try to do this on my simple toy model which I have built yesterday, which has exactly same problem for some reason; I have seen the increase in the training loss for up to x100 times.

(Stepan Ulyanin) #28

The issue was resolved by itself, not sure what happened :confused: