Parallel processing samples that can't be orgnized as batches

I have a task that every sample have different sizes (or different modules of network to forward), so I can’t put them in batches. But train the samples one by one is very inefficient. How can I paralleling the process?

I think torch.multiprocessing might be one solution. But I’m still not sure how to use it after reading the docs.

A common approach is to pad the inputs to the biggest shape. You still have to make sure that your model works well with padded inputs of course.