Run two nets in parallel on single machine with multiple GPUs

Depending on the model and thus the workload, the CPU might not be able to run ahead and schedule the kernel launches fast enough.
You could profile it using e.g. Nsight Systems and check, if the kernels are overlapping or if they are so short, that they are executed “sequentially” on these two devices.

1 Like