No option to change gpus when using torch.distributed.init_process_group

We recommend to use the native mixed-precision training via torch.cuda.amp as well as the native DDP implementation, as described here.