Can input not be assigned to 0 gpu only?

I’n trying to train models with 4 gpus(1080ti) with batch of 128, but got out of memory problem.
The gpu0 is actually out of memory, and other 3 gpus has 7G out of 12G which is very sufficient.
The main reason is that all inputs has to be on gpu 0 first, and it is replicated to other gpus later.
It would be much efficient of inputs can be run in parallel. Is it possible?

As far as i know it’s not possible, what you can do to optimize the memory usage is the following:
target and ground-truth must be in the same gpu.

When you call data parallel you can allocate output and gt to cuda1 and input to cuda1

model=DataParallel(model,output_device=1).cuda()

Here you are calling model in gpu0, but output in gpu1

Now when u allocate input and gt

input=input.cuda(2)
gt=gt.cuda(1)

So output and gt are in the same gpu, and input in a 3rd gpu