I have very intense task with matrices. I want to pass a tensor to GPU in a separate thread and get the result of performed operations.
I created a class -
Worker with interface
compute that do all the work and returns the result. Now, I want to pass 4 class instances along with tensors to separate threads for computing on all my 4 GPUs.
workers = [ Worker(64, device= torch.device('cuda:0')), Worker(64, device= torch.device('cuda:1')), Worker(64, device= torch.device('cuda:2')), Worker(64, device= torch.device('cuda:3')) ] matrices = [tensor1, tensor2, tensor3, tensor4] output =  with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor: for worker, matr in zip(workers, matrices): result = executor.submit(worker.compute, matr) output.append(result)
output.result() throws the following error .
“CUDA error: an illegal memory access was encountered”
I think the code inside the class is fine, because everything works on every GPU without threads.
I am new to PyTorch, help me please.
Today, in the morning I found out that the problem may cause custom CUDA kernel I use. Without it everything works. Any way, I appreciate any suggestions and good practices on using threading with PyTorch, because I am not sure my code exploits a good way.