We notice that if we set autograd.set_detect_anomaly(True), autograd.grad becomes not permitted in CUDA Graphs. We could not find any description on this topic on either CUDA Graphs and PyTorch. Could anyone please explain why it is the case, what is changed by enabling detect_anomaly, and is it safe for us to turn off detect_anomaly and safely accept the results calculated from CUDA Graphs?
detect_anomaly should only be enabled when you’re debugging (it’s pretty slow!) so if your code is working properly, it is ok to disable it before running with cudagraph yes.
If you do have issues with NaN during the backward pass, I would suggest disabling cudagraph while debugging these (with anomaly mode) and then re-enable cudagraph once they are fixed.