Can I specify the batchsize of each gpu process?

For example:
gpu 0: 1
gpu 1: 2
pug 3: 1
the total batchsize is 4.

if you manually implement DataParallel yourself, this is possible.
We implemented DataParallel using collectives such as broadcast, scatter, gather.

See this function for reference (and if you want to copy it and modify yourself):