Data Parallelism on a single GPU card

TszKin · November 14, 2017, 10:12am

Hi,

I want to train a language model with various batch sizes, esp. batch size = 1. As expected, it takes much longer time to train one epoch but at the same time the gpu utility is very low. Can I speed up this process by data parallelism using more workers (?) on the same gpu card ? (Maybe something similar to GPU version of “Hogwild” with data parallelism ?)

Thanks.

ptrblck · November 14, 2017, 10:32am

I think the reduce param in the Loss functions might be helpful.

reduce (bool, optional) – By default, the losses are averaged or summed for each minibatch. When reduce is False, the loss function returns a loss per batch element instead and ignores size_average. Default: True

Using this, you could increase the batch size and get the losses for each batch.