I want to implement automatic mixed precision into my training framework. According to
https://pytorch.org/docs/stable/amp.html#torch.autocast autocasting is available for cuda and cpu.
I wondered that for their training example on cpu
# Creates model and optimizer in default precision model = Net() optimizer = optim.SGD(model.parameters(), ...) for epoch in epochs: for input, target in data: optimizer.zero_grad() # Runs the forward pass with autocasting. with torch.autocast(device_type="cpu", dtype=torch.bfloat16): output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step()
they do not use a Gradscaler, which, however, turned out to be crucial for my training on gpu. It seems that GradScaler is only available for cuda (
torch.cuda.amp.GradScaler) and it throws errors when trying to use it with tensors on cpu.
Thus my question(s):
- Is there any reason why one would, in contrary to cuda, not need a GradScaler on cpu?
- If not, is there any implementation for a GradScaler when running amp on cpu?