I’m not sure why you need to use detect_anomaly, as it’s a debugging tool and is expected to slow down the code. Could you explain the reason a bit more?
Using retain_graph would only be necessary if you need to keep the intermediate tensors alive after a backward operation. I’m still unsure why it’s needed in your case, so would need more information about the actual use case.