Torchrun runs perfectly with 1 gpu node but fails on 4 giving a "[Errno 12] Cannot allocate memory" error

Hi,
I have a CNN that I want to run in parallel across 4 GPU nodes. I tried running it with 1 GPU, and the program works fine. Here is the code I used:

OMP_NUM_THREADS=1 torchrun --nproc_per_node=1 model.py --exp_name=try1 --datadir=‘./data/try1’ --logdir=‘./logs/’ --num_block 3 3 3 --hidden 16 32 64 --num_workers=4 --lr=0.001 --batch_size=16
Next, I tried running it with 4 GPU nodes by using the following:

OMP_NUM_THREADS=6 torchrun --nproc_per_node=4 model.py --exp_name=try1 --datadir=‘./data/try1’ --logdir=‘./logs/’ --num_block 3 3 3 --hidden 16 32 64 --num_workers=24 --lr=0.001 --batch_size=16

But this time, I got the following error :
OSError: : [Errno 12] Cannot allocate memory

I don’t understand why I am getting this error and how to fix it. Any suggestion is highly appreciated.

Best,
Ram