Mixed precision and Spectral norm

I started playing around with new Amp interface. The thing is: I am training GANs and my models use spectral norm.

I would like to know how things work with mixed precision when using spectral norm. Are my spectraly normalized weights eligible for fp16 precision? Do I need to do something extra to get things working (such as increasing spectral norm eps?)

Thanks

AMP will autocast operators as described in this list. Since spectral_norm uses some of them (e.g. matmul) this operation would be performed in FP16. If you want to keep the calculation in FP32, you can disable autocast for this call.

1 Like

@ptrblck, thank for you answer. I have some questions about AMP, maybe eu can clarify for me.

1 - About weight initilization: let’s say my initializer produces very small weights, will them flush to zero when run inside autocast?
2 - What about model inputs: if I am working with vector of very small floats as inputs to my model, what happens to them?

Thanks

  1. No, that shouldn’t be the case as the compute and accumulate dtype would still be performed in FP32 even if FP16 inputs are passed.
  2. If depends on the range of your values and if they are representable. Theoretically it would be possible, e.g. via:
x = torch.tensor(2**(-149)).cuda()
print(x)
> tensor(1.4013e-45, device='cuda:0')
print(x.half())
> tensor(0., device='cuda:0', dtype=torch.float16)

but then I would doubt that the small FP32 values would contribute to your model in any way.
Note that a GradScaler should be used for mixed-precision training together with autocast to avoid gradient underflow.

It would be interesting to know more about your use case, i.e. what kind of model “needs” values close to zero to perform properly.

@ptrblck, there is no requirements to have inputs close to zero. I just wanted to know how AMP handled these cases.

Thanks for your help!