In the PyTorch documentation page for Distributed Optimizers, the webpage says:
WARNING: Distributed optimizer is not currently supported when using CUDA tensors
Link to snapshot of the documentation page captured by web archive at the time of writing this post: Distributed Optimizers — PyTorch master documentation (archive.org)
However, the tutorial on Distributed optimizers (Shard Optimizer States with ZeroRedundancyOptimizer — PyTorch Tutorials 1.11.0+cu102 documentation) uses models on CUDA devices with no issues at all.
I tried using the
ZeroRedundancyOptimizer with my model too and it does seem to be working even when models are on CUDA/GPUs. Is the documentation page inaccurate/out of date, or am I misinterpreting what the warning means?