I’ve been confused lately regarding torch.cuda.amp and Nvidia apex. What each of them does? (As such I know that both of them does mixed precision) How do they differ and when to use which?
torch.cuda.amp just landed recently in the nightly builds and is the recommended way to use mixed-precision training.
CC @mcarilli, who was working on pushing this utility into the PyTorch core.
torch.cuda.amp is the way to go moving forward.
We published Apex Amp last year as an experimental mixed precision resource because Pytorch didn’t yet support the extensibility points to move it upstream cleanly. However, asking people to install something separate was a headache. Extension building and forward/backward compatibility were particular pain points.
Given the benefits of automatic mixed precision, it belongs in Pytorch core, so moving it upstream has been my main project for the past six months. I’m happy with
torch.cuda.amp. It’s more flexible and intuitive than Apex Amp, and repairs many of Apex Amp’s known flaws. Apex Amp will shortly be deprecated (and to be honest I haven’t been working on it for a while, I focused on making sure
torch.cuda.amp covered the most-requested feature gaps).
torch.cuda.amp, early and often. It supports a wide range of use cases. If it doesn’t support your network for some reason, file a Pytorch issue and tag @mcarilli. In general, prefer native tools for versioning stability (that means
torch.nn.parallel.DistributedDataParallel too) because they’re tested and updated as needed for each master commit or binary build.
Apex will remain as a source of utilities that can be helpful, e.g. fast fused optimizers, but forward+backward compatibility across all Pytorch versions can’t be guaranteed. Don’t take a dependency on Apex unless you want to try those.
Hello. I have only single GPU and a very heavy model, so I am running out of memory…
Recently, I figured out that there is mixed precision training which saves the GPU memory during training.
Is NVIDIA apex an older tool to use mixed training?
I read some post and saw your comment there is a pytorch implementation which is called torch.cuda.amp.
How do I use it and what is nightly builds you commented above?
You may want to refer to:
apex.amp was our first implementation of mixed-precision training, is deprecated now, and replaced with
torch.cuda.amp. @seungjun posted the examples above to see its usage.
You don’t need to install a nightly release anymore, as
torch.cuda.amp is available in the stable releases already.
It is feasible to combine
torch.cuda.amp and fp16_compress_hook? Casting gradients to
float16 and allreducing those
float16 gradient tensors are helpful to reduce communication cost.
Yes, the action of those hooks should be composable with the things amp does.