How to use torch.nn.functional.conv2d with multi gpu?

How to use torch.nn.functional.conv2d for computing with multi gpus?

anyone knows about it?

You won’t be able to use this API on different GPUs directly and could split the input in e.g. the batch dimension, send each chunk to a GPU, and apply the convolution there.

thanks for the reply. after sending each chunk to different gpus, how to apply the convolution in parallel?

You could directly call them:

out1 = F.conv2d(in1, weight1)
out2 = F.conv2d(in2, weight2)

where inX and weightX is on the cuda:x device.

but these two operations are done sequentially, right? how to make it parallel?

The kernel scheduling would be executed sequentially by the CPU and thus the launches might be delayed, but the kernel execution on the GPU would be performed in parallel.
You would see this effect by profiling a sufficiently large workload using e.g. Nsight Systems.

thanks for the explanation.
so out2 = F.conv2d(in2, weight2) will start before F.conv2d(in1, weight1) is finished on GPU 1, right?

BTW, what if the weights are shared in two lines, should I do it by:
out1 = F.conv2d(in1, weight.cuda(0))
out2 = F.conv2d(in2, weight.cuda(1))

If the CPU is fast enough to schedule it and the actual kernel execution takes more time than the launch, then yes.
You won’t be able to see any overlap with a tiny workload, as the kernel launch overheads would be larger than the actual GPU workload, i.e. kernel1 finishes before the CPU can schedule the launch of kernel2.

Yes, you have to move the parameters to the appropriate device before executing the operation.

Thank you very much for the reply!