How to parallelize evaluation on a single GPU

Hi everyone. I have a nn.Module object which has been written such that it can take one data point (sample) at a time instead of a batch and can make some calculations about it. The weights for this model are trained and I don’t plan to update them so I plan to use this mainly just for evaluation.

It is non-trivial for me to further vectorize the code to permit evaluation of multiple sentences as a batch. I still have a good chunk of GPU memory as well as processing power left when I evaluate one sentence at a time. What would be an effective way to parallelize so as to make the best use of the hardware.

One immediate idea I can think of is using the torch multiprocessing module but it looks kinda hacky. Any other alternatives?

Refer to Split Single GPU

Thanks! will check it out and post here if it helps.