I am following this tutorial and I have only changed the number of classes. Mine is 13. Now I have also added another transformation to resize the images because they were too large. I am training on a single GPU with a batch size of 1 and a learning rate of 0.005 but lowering still results in a Loss is NaN. I haven’t tried gradient clipping or normalisation because I am not really certain how to do it in the pre-implemented architecture. Additionally my dataset consists of single objects within the image. Could it be that due to the fact that the tensor is sparse, this causes the loss to behave in this way?
Is the loss growing until it eventually yields a NaN values or do you encounter the NaN just in a single step?