Multiprocessing : subprocess is not running at blas(or lapack) command

I am optimizing some function with 20 different initializations.
To speed up, I am using multiprocessing(20 processes) on cpus not using gpus.

My program does matrix calculation with the matrix whose dimension is increased by one compared to the previous step.
e.g) 20 by 20 at step(n)=> 21 by 21 at step(n+1)

When I used torch.gesv, it stopped at the step where the matrix size is 37 by 37,
so I replaced it with torch.inverse().mm(), then it stopped at matrix size of 128 by 128.
and this time, the problem occurs at mm() command.(not in this torch.inverse().mm(), in previous mm() command)
Even with the randomness of my program, for multiple runs,
the problem occurs at the step of exactly the same size matrix.

In debugging mode, those problematic lines takes unreasonably long time and returns None(printing nothing in debugging console), after running those problematic codes, then any calculation(even simply printing value of variable) returns None(printing nothing) and CPU% usage of those subprocesses are less than 5% mostly 0%, which seems doing nothing.
but before testing(running) those problematic lines in debugging mode,
if I change the arguments to numpy.array and do the same calculation with numpy then it works well.

This problem doesn’t happen if I don’t use multiprocessing,
and the same problem occurs only with 1 subprocess(which mean one initialization in my case).

Maybe my knowledge about multiprocessing it not enough, below is code where I use multiprocessing

	pool = torch.multiprocessing.Pool(n_init)
	results = [pool.apply_async(optimize, args=([arguments depending on i])) for i in range(n_init)]
	return_values = [res.get() for res in results]
	minimum, minimum_value = zip(*return_values)

Any idea about this problem?

Thanks in advance.

Could you post a minimal example that causes the problem?

import torch
from torch.autograd import Variable
import torch.optim


def reproduce():
	n_data = 60
	ndim = 100

	n_init = 1
	b = Variable(torch.randn(ndim, n_data))
	A = Variable(torch.randn(ndim, ndim))

	pool = torch.multiprocessing.Pool(n_init)
	res = pool.apply_async(torch.gesv, args=(b, A))
	return res.get()


if __name__ == '__main__':
	print(reproduce())

In my computer, the problem occurs when n_data = 70
but with n_data = 60, it works fine.

Thanks!

Thanks for the script! Hmm I’m not able to replicate this (I ran your script on two machines and there’s no slowness or hanging)… What are you running on / what is your pytorch version?

Sorry, It seems that the other code that I cannot share blocks process, it seems that gesv is not a problem?

But while trying to find a source of problem, I can locate where exactly the problem is.

Thanks.

I’ve done up to n_data = 10,000,000 so far and it looks okay. I’m running on pytorch master, but I’m not sure if that’ll make a difference.

Sorry, It seems that the other code that I cannot share blocks process, it seems that gesv is not a problem.
But while trying to find a source of problem, I can locate where exactly the problem is.

Thanks.

import torch
from torch.autograd import Variable
import torch.optim
from torch.autograd._functions.linalg import Potrf


def reproduce():
	n_data = 100
	ndim = 100

	x_input = Variable(torch.rand(n_data, ndim) * 2 - 1)
	x_input.data[0, :] = 0
	output = Variable(torch.randn(n_data, 1))

	n_init = 1
	b = Variable(torch.randn(ndim, n_data))
	A = Variable(torch.randn(ndim, ndim))

	chol = Potrf.apply(A.mm(A.t()) + Variable(torch.eye(ndim)))

	pool = torch.multiprocessing.Pool(n_init)
	res = pool.apply_async(torch.gesv, args=(b, A))
	return res.get()


if __name__ == '__main__':
	print(reproduce())

Using Potrf.apply seems to be a source of error,
I am not sure potrf is in stable release,
but you can find source torch.autograd._function.linalg in github.

Using torch.potrf to tensor still causes the same problem.

Thanks for you help!

Interesting. Yes, I’m getting the hanging now. I’ve opened an issue on the github with the bug: https://github.com/pytorch/pytorch/issues/3619 – we are looking into it.

Also torch.potrf is a thing and it probably calls what you’re using under the hood.

This is more clear error case and ad-hoc solution to that problem

def error_reproduce():
	n_data = 500
	ndim = 1000

	b = Variable(torch.randn(ndim, n_data))
	A = Variable(torch.randn(ndim, ndim))

	A_sym = A.mm(A.t())

	pool = torch.multiprocessing.Pool(1)
	res = pool.apply_async(torch.gesv, args=(b, A))
	pool.close()
	pool.join()
	return res.get()


def no_error_without_blas():
	n_data = 500
	ndim = 1000

	b = Variable(torch.randn(ndim, n_data))
	A = Variable(torch.randn(ndim, ndim))

	pool = torch.multiprocessing.Pool(1)
	res = pool.apply_async(torch.gesv, args=(b, A))
	pool.close()
	pool.join()
	return res.get()


def no_error_by_calling_pool_first():
	n_data = 500
	ndim = 1000

	b = Variable(torch.randn(ndim, n_data))
	A = Variable(torch.randn(ndim, ndim))

	pool = torch.multiprocessing.Pool(1)

	A_sym = A.mm(A.t())

	res = pool.apply_async(torch.gesv, args=(b, A))
	pool.close()
	pool.join()
	return res.get()

Only difference is the order of calling blas command(torch.mm[==addmm]) and multiprocessing.Pool

Thanks.