Pytorch with mixed precision

Hi all,

I am trying to implement mixed precision training in pytorch, which would involve scaling the loss (computed in half precision) with a scaling factor and compute the gradient, then update the master model (in one precision) with the computed gradient divided by the same scale factor.

I know that I can extract the gradient by:

print(model.layer.weight.grad)

However, are there any ways that I could manually manipulate such gradient (say division/multiplication) and apply it to update another model for mixed precision training?

Many thanks for your help!

Hi,

I guess you can do:

for orig_p, new_p in zip(model.parameters(), other_model.parameters()):
    new_val = your_scaling(orig_p.grad)
    new_p.grad = new_val
1 Like

Additionally to @albanD’s suggestion, you could scale the loss before the backward call, which would automatically scale all gradients and unscale the gradients before the update.

Note that automatic mixed-precision is currently available in the master and nightly binaries, in case you don’t want to reimplement it manually.

2 Likes

Much thanks for @albanD’s and @ptrblck’s replies. I will try out the automatic mixed-precision.