`Exception: process 0 terminated with exit code 1` error when using `torch.multiprocessing.spawn` to parallelize over multiple GPUs

Hi @iffiX,

What’s the shape and data type of X_prime_class_split[j] ?
X_prime_class_split[j] is a 2d numpy array. How I have CPU parallelize my code is two-fold. First, it parallelizes over the 2 binary classes. Secondly, for each binary class, the dataset is split into batches and parallelization is performed over the batches. X_prime_class_split[j] is just a dataset batch, therefore it is a 2d numpy array.

Maybe we could represent it as a tensor.
I have actually already converted all of the numpy functions in my code into PyTorch functions. It is here in my Github. Of course, this code is not fully working properly yet because of the issue with torch.multiprocessing.Pool. So in this code, X_prime_class_split[j] is a PyTorch tensor.

You could definetly vectorize this inner loop.
Thanks for the tip! I didn’t realize chunk of codes could also be vectorized (using numpy.vectorize). I googled around but couldn’t find the PyTorch equivalent of numpy.vectorize. If I were to vectorize this part of the code, do you know how I could do it using PyTorch tensors?

There is no efficient way to parallelize kronecker product over n_copies since these code are iterative and strongly serial.
Agree, there is no way to efficient parallelize kronecker product because of it’s iterative and serial nature. As mentioned above, my code is actually parallelized over the 2 binary classes and batches of the dataset, so I am not looking to parallelize the kronecker product.

But you could use the einsum function of pytorch to calculate…
I have actually already done this in my Github link above! This gives me some comfort knowing that I am on the same page as you :smile:

I guess if I can’t parallelize my code over multiple GPUs, plan B would be to rewrite my code to just use 1 GPU processing.

Many many thanks again for having a look @iffiX. Really appreciate it heaps!