PCI-e communication latency, transfer overheads, and weight sync on CPU mean that small models don’t benefit from multi-GPU training.
Can you try large image models like ResNet training from examples repo and check if they saturate both GPUs?
Hi @bottanski , I also observed this. Do you have any progress on this? Thank you.
@bottanski @magic282 I am observing the same. And the model is actually not speeded up at all in my scenario.