While running multiple PyTorch scripts Using DataParallel, only the first one is distributed among two gpus, rest are loading 1st GPU memory

while running multiple PyTorch scripts Using DataParallel, only the first one is distributed among two GPUs, rest are loading 1st GPU memory.

Cant all the program distribute themselves like 1st one?
If they can what’s the process?

I am just loading the model and using

model = ConvTasNet(args.N, args.L, args.B, args.Sk, args.H, args.P, args.X, args.R,
                       args.C, norm_type=args.norm_type, causal=args.causal,
                       mask_nonlinear=args.mask_nonlinear)
print(model)
if args.use_cuda:
    model = torch.nn.DataParallel(model)
    model.cuda()

Same procedure for all scripts.

Can someone please look into this?

I cannot reproduce this issue and am able to execute the nn.DataParallel script multiple times:

# tmp.py
import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Conv2d(3, 6, 3, 1, 1),
    nn.ReLU(),
    nn.Conv2d(6, 12, 3, 1, 1),
    nn.ReLU())

model = nn.DataParallel(model).cuda()
x = torch.randn(8, 3, 224, 224).cuda()

for _ in range(100):
    out = model(x)
print('done')

# run.sh
python tmp.py

# in terminal
bash run.sh &
bash run.sh &
bash run.sh &
bash run.sh &
bash run.sh &

Using nvidia-smi I can see that each device will be used by 5 scripts.