FairSeq fails to utilize multiple GPUs but says "Training on X devices"

Invocation:

python $FAIRSEQ/train.py "$DATABIN" \
    --max-epoch 10 --max-tokens 6000 --update-freq 1 \
    --ddp-backend=no_c10d --memory-efficient-fp16 \
    --lang-pairs "$SRC-enu" \
    # arch stuff
    --user-dir ./fairseq-modules/ \
    --save-dir "$SAVE/checkpoints" \
    --num-workers 8 \
    --tensorboard-logdir "$SAVE"

But when monitoring the GPUs one is at 100% and all the others are 0%.

I’m on a shared machine. So the devices I’m getting don’t have consecutive device IDs. But this’s done before the container image can see them. So it should be seeing only the devices assigned, and with sensible IDs.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

Thank you!