Hello everyone!
So I have been getting this error and I tried researching more about what it means but I’m having a hard time understanding it.
So I have 2 data loaders, one for training and one for validation, each has num_workers =4.
Every 1000 iterations, I call the validation function and after the validation loader is over, I get the following error, although it does not terminate the code.
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
Setting num_workers to 0 for both data loaders seem to solve the issue but I don’t really understand why am I getting it with num_workers>0.
Thank you!
Edit: Even with just the training loader, setting num_workers to anything above 0 gives the same error after each epoch