Hi, after reading the docs about mixed precsion, amp_example
I’m still confused with several problems.
Let’s say if I have two networks, one is the standard resnet50 and another is a sparse conv layer.
input images are first passed through resnet50 and then sparse convs.
If I only want to use half for resnet and keep float32 for the sparse conv layer (so I don’t have to modify the code)
I only need to warp the model within the autocast function and disable it before the sparse conv layers?
like,
with autocast():
out = resnet50(x)
with autocast(enabled=False):
out = sparseconv(out.float())
right?
And from my knowledge, gradients are scaled during mixed precision,
If I have to write my own backward function for sparse conv layers (warped in autocast(disabled)),
Do I still need to consider the scale?
No, gradient scaling and auto casting are working together but are independent from each other.
You don’t need to explicitly implement gradient scaling for the custom layer.
Imagine we have an nn.Sequential model with a specific layer type that should not be autocasted. Is there a way to “wrap” that layer so that it is ignored by the autocast?
If you are using custom modules, you could use the @autocast decorator on the forward method and disable it. On the other hand, if you are using built-in modules, you could wrap them in a custom class and disable autocast in the same way.
Alternatively you could also split the nn.Sequential container and just decorate the parts with autocast, which should use it.