For example:
gpu 0: 1
gpu 1: 2
pug 3: 1
the total batchsize is 4.
smth
2
if you manually implement DataParallel
yourself, this is possible.
We implemented DataParallel using collectives such as broadcast
, scatter
, gather
.
See this function for reference (and if you want to copy it and modify yourself):