NCCL operations complete asynchronously by default and your workers exit before either complete.
You can avoid that by explicitly calling barrier() at the end of your script to ensure all ranks reach that call before any start exiting.
NCCL operations complete asynchronously by default and your workers exit before either complete.
You can avoid that by explicitly calling barrier() at the end of your script to ensure all ranks reach that call before any start exiting.