loss_function = MSELoss()
loss_function(torch.tensor([0.0329]).to(torch.float16), torch.tensor([60000]).to(torch.float16))
--> tensor(inf, dtype=torch.float16)
why is the results inf?
loss_function = MSELoss()
loss_function(torch.tensor([0.0329]).to(torch.float16), torch.tensor([60000]).to(torch.float16))
--> tensor(inf, dtype=torch.float16)
why is the results inf?
float16
has a max range of +- 65504
and will overflow to +- Inf
outside of this range.
It’s thus expected that nn.MSELoss
will overflow via (0.03 - 60000)**2 ~= 3.6e9
Thank you for input. so its not possible to train a fp16 model for mse, since it’s going to be inf in most of the cases when loss is higher than 65k
Training any model in pure float16
is tricky as not only will large activation values potentially overflow but your training would also suffer from underflowing small gradients.
This is why we’ve developed the mixed-precision training util. via torch.amp
, which not only uses an autocast
context to transform tensors to float16
when it’s safe, but also uses a loss scaler to avoid underflows. Take a look at the AMP recipe and the examples to see how to use it.
I tried that but I get error the following error
RuntimeError: Found dtype Float but expected Half
Could you post a minimal, executable code snippet which would reproduce the error, please?
Can you tell me what part of code would be helpful? Since, it is very modular, now sure I can paste the whole code.
The first link already does it as it describes a simple network first with a standard training loop in default precision. In the next section autocast
is added and afterwards the GradScaler
both with code changes and with explanations why these utils. are used. Then the same initial code is posted again as All together: "Automatic Mixed Precision"
. Did you walk through this doc and got stuck somewhere?
Yes, I went through the doc. I think now the code works for me but I get error while updating the learning rate scheduler. is this the correct way to update learning rate scheduler
scaler.step(self.lr_scheduler)
or should I call it in the conventional way of self.lr_scheduler.step()
where as self.lr_scheduler.step() works for me.
I get the following error
'LambdaLR' object has no attribute 'param_groups'
scaler.step
expects an optimizer, so use lr_scheduler.step()
instead.
For amp, my model is predicting inf. I’m getting inf only during amp training and not during full precision training. Because of inf I’m getting error in sklearn during accuracy computation during training.
Any thoughts, why this is happening?
The forward method should not overflow so I don’t know what might be causing it and would need to get more information about the model etc.
If you look at this part of the code from the doc:
with torch.autocast(device_type='cuda', dtype=torch.float16, enabled=use_amp):
output = net(input)
aren't we telling the model output should always be float16? which will cause the overflow?