Hello!
I have very intense task with matrices. I want to pass a tensor to GPU in a separate thread and get the result of performed operations.
I created a class - Worker
with interface compute
that do all the work and returns the result. Now, I want to pass 4 class instances along with tensors to separate threads for computing on all my 4 GPUs.
The code:
workers = [
Worker(64, device= torch.device('cuda:0')),
Worker(64, device= torch.device('cuda:1')),
Worker(64, device= torch.device('cuda:2')),
Worker(64, device= torch.device('cuda:3'))
]
matrices = [tensor1, tensor2, tensor3, tensor4]
output = []
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
for worker, matr in zip(workers, matrices):
result = executor.submit(worker.compute, matr)
output.append(result)
But output[0].result()
throws the following error .
“CUDA error: an illegal memory access was encountered”
I think the code inside the class is fine, because everything works on every GPU without threads.
I am new to PyTorch, help me please.
EDIT:
Today, in the morning I found out that the problem may cause custom CUDA kernel I use. Without it everything works. Any way, I appreciate any suggestions and good practices on using threading with PyTorch, because I am not sure my code exploits a good way.