The weight of the convolution kernel become NaN after training several batches

BrianPulfer · April 30, 2022, 1:57pm

Weights going to NaN is typically due to overflow. The most common cause I know for this issue is a too high learning rate and no gradient clipping. This makes the parameters of your network to diverge towards +/- infinity.

Have you tried lowering the learning rate? You could also log the norm of the gradients (Check the norm of gradients).