FP16 gives NaN loss when using pre-trained model

ptrblck · August 26, 2020, 6:24pm

You could register forward hooks for each module and check the output. Once you could isolate the first layer creating the invalid values, you could check the input as well as the parameters to further isolate it.