Hi, after reading the docs about mixed precsion, amp_example
I’m still confused with several problems.
Let’s say if I have two networks, one is the standard resnet50 and another is a sparse conv layer.
input images are first passed through resnet50 and then sparse convs.
If I only want to use half for resnet and keep float32 for the sparse conv layer (so I don’t have to modify the code)
I only need to warp the model within the autocast function and disable it before the sparse conv layers?
with autocast(): out = resnet50(x) with autocast(enabled=False): out = sparseconv(out.float())
And from my knowledge, gradients are scaled during mixed precision,
If I have to write my own backward function for sparse conv layers (warped in autocast(disabled)),
Do I still need to consider the scale?