Mask RCNN Loss is NaN

I am following this tutorial and I have only changed the number of classes. Mine is 13. Now I have also added another transformation to resize the images because they were too large. I am training on a single GPU with a batch size of 1 and a learning rate of 0.005 but lowering still results in a Loss is NaN. I haven’t tried gradient clipping or normalisation because I am not really certain how to do it in the pre-implemented architecture. Additionally my dataset consists of single objects within the image. Could it be that due to the fact that the tensor is sparse, this causes the loss to behave in this way?

Is the loss growing until it eventually yields a NaN values or do you encounter the NaN just in a single step?

In just a single step. What could possibly be wrong?

Hey I also experienced the same thing. Did you solve this already?

Hi, I am having the same issue using 15 classes.
Has anyone found a solution?


Same issue with, I just get this after one step:

Loss is nan, stopping training
{'loss_classifier': tensor(0.9567, device='cuda:0', grad_fn=<NllLossBackward>), 'loss_box_reg': tensor(nan, device='cuda:0', grad_fn=<DivBackward0>), 'loss_mask': tensor(2.6179, device='cuda:0', grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_objectness': tensor(13.4737, device='cuda:0',
       grad_fn=<BinaryCrossEntropyWithLogitsBackward>), 'loss_rpn_box_reg': tensor(18.5134, device='cuda:0', grad_fn=<DivBackward0>)}

What does this mean?

Reducing the learning rate, helps to get it further but it still stops after 3 steps

I was able to fix it for my use case, for anyone out there who might have the same. The problem was that I had quite a lot of very small pixel regions/boxes which messed things up. The way the script I use works, it creates box for every isolated pixel region, even if it just the size of 3 pixels. So if you have masks with a lot of very small pieces of mask you probably have the same problem as I did. I removed all isolates pieces of the mask which surface’s are smaller than 2500 pixels. Now the training works well, without having to reduce the learning rate.