CNN model outputs NaN only for the final batch

Hi all,

I’ve been working on training a CNN using PyTorch and I’ve come across an interesting issue. When I was training my model, the model would output NaN but only for the final batch in the epoch (when the remaining samples does not match the batch size). Upon looking into my model, I realized it was the CNN sequential model that gave the final NaN value.

This happened when I was using a batch size of 64. When I ensured that the total training samples was divisible by 64, I did not receive a NaN value. Furthermore, when I changed the batch size to 32 and 128, the problem was also solved. After trying a few more batch sizes, I found that only batch sizes of 63 and 64 caused this problem.

I am using the following CNN model below:

self.CNN = nn.Sequential(

          nn.Conv2d(19, 32, kernel_size = 3, padding = 1),
          nn.ReLU(),
          nn.Conv2d(32, 64, kernel_size = 3, stride = 1, padding = 1),
          nn.ReLU(),
          nn.MaxPool2d(2,2),
      
          nn.Conv2d(64, 128, kernel_size = 3, stride = 1, padding = 1),
          nn.ReLU(),
          nn.Conv2d(128, 128, kernel_size = 3, stride = 1, padding = 1),
          nn.ReLU(),
          nn.MaxPool2d(2,2),
          nn.Flatten()
      )

So far, I’ve just decided to not use a batch size of 64, and my model is working fine. However, I am pretty curious as to why this issue is occurring. Wondering if you all have any ideas?

Are you able to directly reproduce it using any random input?