gpu 0: 1
gpu 1: 2
pug 3: 1
the total batchsize is 4.
if you manually implement
DataParallel yourself, this is possible.
We implemented DataParallel using collectives such as
See this function for reference (and if you want to copy it and modify yourself):