I noticed a large drop in GPU utilization if I set this option in the beginning of my training script. Was that to be expected? If so what are the reason ?
I would expect the code to run slower if the anomaly detection is enabled, as some additional checks are performed, e.g. here.
I personally just use it for debugging and thus think it shouldn’t match the performance of code with disabled anomaly detection.
Once you’ve found the the reason for the NaNs, you could disable it.