The weight of the convolution kernel become NaN after training several batches

There is a similar thread at Gradient value is nan - #3 by saumya0303

As @ptrblck suggested, you could use torch.autograd.set_detect_anomaly(True) to see when the gradient go to NaN and debug from there.

Hope this helps.