I have used PyTorch Lightning. (While I can’t compare the two, as I haven’t used Ignite).
It has been the smoothest experience as far as I have come across, w.r.t multi-GPU training. Changing from a single GPU to a multi-GPU setup is as simple as setting num_gpus in trainer.fit() to as many as you’d like to use.
TPU support is also integrated, where you’d just specify num_tpu_cores, without changing any code.
Thank you very much for your contribution.
I started using Ignite after read the exciting quickstart guide showing the essentials for define and training a simple model. But the provided examples which use multi GPU training do not seem to follow the same simplicity.
Currently, the stable (v0.3.0) release relies only on native PyTorch distributed API where users need to manually set up distributed proc group, wrap model with nn.parallel.DistributedDataParallel and execute the script with torch.distributed.launch tool, or use mp.spawn.
I would go with Lightning then.
The documentation is pretty clear and readable.