Running model on multiple GPUs RuntimeError: Caught RuntimeError in replica 0 on device 0

Anakin · July 10, 2020, 1:27pm

I got this error solved.
The problem is the batched data is not tensor data. It’s a list includes dict based data ( training samples + ground truth). If the input data to model is tensor data organized in NCHW mode, it works as expected.

I’m still wondering that if it’s possible to pass a list of dict objects to a model inherited from DataParallel.
The list batch data can be somehow automatically scattered appropriately on multiple GPUs?