Hi everyone. I have a nn.Module object which has been written such that it can take one data point (sample) at a time instead of a batch and can make some calculations about it. The weights for this model are trained and I don’t plan to update them so I plan to use this mainly just for evaluation.
It is non-trivial for me to further vectorize the code to permit evaluation of multiple sentences as a batch. I still have a good chunk of GPU memory as well as processing power left when I evaluate one sentence at a time. What would be an effective way to parallelize so as to make the best use of the hardware.
One immediate idea I can think of is using the torch multiprocessing module but it looks kinda hacky. Any other alternatives?