Equivalent of GradScaler in the C++ API

I currently implement a training of a MSS (Music Source Separation) model in C++ and I need the equivalent of GradScaler (from torch.cuda.amp.grad_scaler in python).

As explain here two years ago by Michael Carilli:

I definitively not found anything about this point in the different places of information.

I do use the torch.cuda.amp.autocast imitating trick, with a revelant 6x performance boost (as expected in the NVIDIA documentation, thank’s to Tensor Cores!), but because of the topology of my model, I really need a gradient scaling of my loss function.

Has anyone ever faced this problem? Is there a shareable solution or should I think about a custom implementation?

Thank you in advance for your help!

Self-Reply

I’ve done the C++ implementation of the GradScaler class following the python implementation.

You can find the gradscaler.hpp and is gradscaler_test.hpp files in my github gist:

Unfortunatly, this is not tested on multi-GPU systems, but pass the test on single GPU without problem.

Just be carefull with the scale() method that support both Tensor or a iterable of Tensors in a generic way. The method support all the std-like containers that support the back insertion (std::back_inserter) for build N-dimensional output recursively.

For a better support, I use all the c10 namespace tools when possible (optional, variant, …).

Hope that help the Libtorch C++ community.