Do we need to do torch.cuda.amp.autocast(enabled=False) before a custom function?

If the forward pass of your op consists only of the custom kernel itself, that’s fine. Autocast doesn’t know about your kernel (unless you register it like in the dispatch tutorial) and won’t touch the inputs.

If your op consists of your custom kernel + a few torch.* ops, and you don’t locally autocast(enabled=False), the torch.* ops still might be affected by autocast, which you may or may not want.

For torch.autograd.Functions, the @custom_fwd(cast_inputs=torch.float32/16) takes care of casting inputs and disabling autocast for the forward body.

2 Likes