GradScaler for CPU with AMP

Hi,
Here AMP in pytorch it is stated that we can use uses torch.autocast and torch.cpu.amp.GradScaler or torch.cuda.amp.GradScaler. But when I try to import the torch.amp.GradScaler, or torch.cpu.amp.GradScaler, it says that there is no GradScaler in it.

The pytorch version is 2.2.2+cpu, I have tried it with 2.2.1+cpu. How to resolve this issue?

amp on CPU should use bfloat16 only, which does not need gradient scaling.

But in the documentation it specifically states that you can use gradient scaling with cpu and amp. It even shows the way to call the gradient scaling in AMP, in two different ways.

Could you point me to the section in the docs showing gradient scaling with bfloat16 on the CPU, please?

In second paragraph in this pytorch.org about amp with CPU. It is stated that

Ordinarily, “automatic mixed precision training” with datatype of torch.float16 uses and torch.cpu.amp.GradScaler ortorch.cuda.amp.GradScalertogether…

Moreover, next in the same document says that

  • torch.GradScaler("cpu", args...) is equivalent to torch.cpu.amp.GradScaler(args...).

Thanks for pointing to this section. It seems float16 was implemented for CPU ops (although I have no idea if any performance benefits are expected) and indeed the GradScaler is also available:

conv = nn.Conv2d(3, 3, 1, 1)
x = torch.randn(1, 3, 24, 24)
with torch.autocast(device_type="cpu", dtype=torch.float16):
    x = conv(x)

scaler = torch.GradScaler("cpu")
scaler
# <torch.amp.grad_scaler.GradScaler at 0x7fdc4e202200>

Thank you very much for your answer. But your code gives me the following error:

scaler = torch.GradScaler("cpu")
AttributeError: module 'torch' has no attribute 'GradScaler'

My pytorch version is 2.2.2. What is yours?

I am using a current nightly binary, but could you try updating to the recent stable release (2.3)?

Sorry for late reply. Because I couldn’t upgrade torch. Now, that I upgraded torch to 2.3.0. and now, your code works fine.

Thank you very muc.