Hi,
I would like to parallelize a for loop inside my model for training on a single CPU but many cores.
To be more precise, at some point, I have something that looks like
def forward(self, x):
# ...
ys = []
for latent in latents:
ys.append( self.submodel(latent) )
# do something useful with ys
# ...
I tried to solve my problem with Pool
from torch.multiprocessing
but got the error message
grad_fn=<AddmmBackward0>)'. Reason: 'RuntimeError('Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).')'
I would appreciate any help.
Thanks.