unhandled cuda error

When trying out just 2 instances with 1 gpu each attached to test distributed training this error occurred:

(base) ubuntu@ip-172-31-11-131:~/detectron2$ NCCL_SOCKET_IFNAME=ens3 NCCL_IB_DISABLE=1 python tools/train_net.py --num-gpus 1 --num-machines 2 --machine-rank 1 --dist-url tcp://32.273.122.180:3000 --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml
Command Line Args: Namespace(config_file=‘configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml’, dist_url=‘tcp://34.253.142.180:3000’, eval_only=False, machine_rank=1, num_gpus=1, num_machines=2, opts=, resume=False)
Traceback (most recent call last):
File “tools/train_net.py”, line 161, in
args=(args,),
File “/home/ubuntu/detectron2/detectron2/engine/launch.py”, line 49, in launch
daemon=False,
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 171, in spawn
while not spawn_context.join():
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 118, in join
raise Exception(msg)
Exception:

– Process 0 terminated with the following error:
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/spawn.py”, line 19, in _wrap
fn(i, *args)
File “/home/ubuntu/detectron2/detectron2/engine/launch.py”, line 70, in _distributed_worker
comm.synchronize()
File “/home/ubuntu/detectron2/detectron2/utils/comm.py”, line 79, in synchronize
dist.barrier()
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py”, line 1424, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1573049306803/work/torch/lib/c10d/ProcessGroupNCCL.cpp:400, unhandled cuda error

Exporting NCCL_SOCKET_IFNAME and NCCL_IB_DISABLE didn’t help, also other fixes as discussed in github issues and everything I found in the net on this topic.
Maybe I forgot an argument. In AWS, “ens3” seems to be the ethernet connection, at least ifconfig does not reveal eth0 as usual.

What to do?