GradScaler for CPU with AMP

Hossein_Ghafarian · May 16, 2024, 11:35am

Hi,
Here AMP in pytorch it is stated that we can use uses torch.autocast and torch.cpu.amp.GradScaler or torch.cuda.amp.GradScaler. But when I try to import the torch.amp.GradScaler, or torch.cpu.amp.GradScaler, it says that there is no GradScaler in it.

The pytorch version is 2.2.2+cpu, I have tried it with 2.2.1+cpu. How to resolve this issue?

ptrblck · May 17, 2024, 12:56am

amp on CPU should use bfloat16 only, which does not need gradient scaling.

Hossein_Ghafarian · May 17, 2024, 11:51am

But in the documentation it specifically states that you can use gradient scaling with cpu and amp. It even shows the way to call the gradient scaling in AMP, in two different ways.

ptrblck · May 17, 2024, 12:42pm

Could you point me to the section in the docs showing gradient scaling with bfloat16 on the CPU, please?

Hossein_Ghafarian · May 17, 2024, 6:14pm

In second paragraph in this pytorch.org about amp with CPU. It is stated that

Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses and torch.cpu.amp.GradScaler ortorch.cuda.amp.GradScalertogether…

Moreover, next in the same document says that

torch.GradScaler("cpu", args...) is equivalent to torch.cpu.amp.GradScaler(args...).

ptrblck · May 17, 2024, 11:44pm

Thanks for pointing to this section. It seems float16 was implemented for CPU ops (although I have no idea if any performance benefits are expected) and indeed the GradScaler is also available:

conv = nn.Conv2d(3, 3, 1, 1)
x = torch.randn(1, 3, 24, 24)
with torch.autocast(device_type="cpu", dtype=torch.float16):
    x = conv(x)

scaler = torch.GradScaler("cpu")
scaler
# <torch.amp.grad_scaler.GradScaler at 0x7fdc4e202200>

Hossein_Ghafarian · May 18, 2024, 12:11am

Thank you very much for your answer. But your code gives me the following error:

scaler = torch.GradScaler("cpu")
AttributeError: module 'torch' has no attribute 'GradScaler'

My pytorch version is 2.2.2. What is yours?

ptrblck · May 18, 2024, 12:35pm

I am using a current nightly binary, but could you try updating to the recent stable release (2.3)?

Hossein_Ghafarian · May 28, 2024, 11:35am

Sorry for late reply. Because I couldn’t upgrade torch. Now, that I upgraded torch to 2.3.0. and now, your code works fine.

Thank you very muc.