I’m trying to reduce memory usage by using autocast
at method forward
only for some modules inside the model.
However, in a situation where the model is Dataparalleled, doesn’t performance decrease occur even if I do not specify that autocast is performed on a specific device like torch.autocast("cuda", torch.float16)
?
I want to know if I can get better (memory or speed) performance when using something like torch.autocast(input.device, torch.float16)