I want to implement automatic mixed precision into my training framework. According to
https://pytorch.org/docs/stable/amp.html#torch.autocast autocasting is available for cuda and cpu.
I wondered that for their training example on cpu
# Creates model and optimizer in default precision
model = Net()
optimizer = optim.SGD(model.parameters(), ...)
for epoch in epochs:
for input, target in data:
optimizer.zero_grad()
# Runs the forward pass with autocasting.
with torch.autocast(device_type="cpu", dtype=torch.bfloat16):
output = model(input)
loss = loss_fn(output, target)
loss.backward()
optimizer.step()
they do not use a Gradscaler, which, however, turned out to be crucial for my training on gpu. It seems that GradScaler is only available for cuda (torch.cuda.amp.GradScaler
) and it throws errors when trying to use it with tensors on cpu.
Thus my question(s):
- Is there any reason why one would, in contrary to cuda, not need a GradScaler on cpu?
- If not, is there any implementation for a GradScaler when running amp on cpu?