Pytorch - multi-threading seg fault when using aysnc-TD3 structure

Hello

I’m currently trying to instantiate several threads to spawn actors for async TD3 network structure, but after the threads ran for some time, a segmentation fault occurs and gdb trace gives the following report:

terminate called after throwing an instance of 'c10::Error'
  what():  invalid device pointer: 0x7ffe813e8200
Exception raised from free at ../c10/cuda/CUDACachingAllocator.cpp:2058 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fffe2c884d7 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fffe2c5236b in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x22f0e (0x7fffe7427f0e in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libc10_cuda.so)
frame #3: <unknown function> + 0x4ccb76 (0x7fff90d6fb76 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: at::get_overlap_status(c10::TensorImpl*, c10::TensorImpl*) + 0x70c (0x7fff792b063c in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #5: at::assert_no_partial_overlap(at::TensorBase const&, at::TensorBase const&) + 0xf (0x7fff792b070f in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::TensorIteratorBase::compute_mem_overlaps(at::TensorIteratorConfig const&) + 0x110 (0x7fff792ed280 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #7: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x43 (0x7fff792f2913 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #8: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xb2 (0x7fff792f3f22 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #9: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2e (0x7fff795c48be in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x2b7b847 (0x7fff544ab847 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cuda.so)
frame #11: at::_ops::add__Tensor::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, c10::Scalar const&) + 0x74 (0x7fff7a101e94 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x4749f75 (0x7fff7c139f75 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::_ops::add__Tensor::redispatch(c10::DispatchKeySet, at::Tensor&, at::Tensor const&, c10::Scalar const&) + 0x74 (0x7fff7a101e94 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x4185613 (0x7fff7bb75613 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #15: at::_ops::add__Tensor::call(at::Tensor&, at::Tensor const&, c10::Scalar const&) + 0x13e (0x7fff7a13b9fe in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x49ec2c6 (0x7fff7c3dc2c6 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #17: torch::autograd::AccumulateGrad::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x106 (0x7fff7c3ddd86 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #18: <unknown function> + 0x49e852b (0x7fff7c3d852b in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #19: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptr<torch::autograd::ReadyQueue> const&) + 0xe8d (0x7fff7c3d193d in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&) + 0x6b0 (0x7fff7c3d2cb0 in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #21: torch::autograd::Engine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x8b (0x7fff7c3c99eb in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptr<torch::autograd::ReadyQueue> const&, bool) + 0x4f (0x7fff90fd264f in /home/contractor/.local/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #23: <unknown function> + 0xd6de4 (0x7fffef918de4 in /lib/x86_64-linux-gnu/libstdc++.so.6)
frame #24: <unknown function> + 0x8609 (0x7ffff7d75609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #25: clone + 0x43 (0x7ffff7eaf133 in /lib/x86_64-linux-gnu/libc.so.6)

what might be the problem? It says the problem occurs at CUDACachingAllocator, is this because I’m running out of system RAM or GPU RAM?

thanks!

You’re likely running out of GPU RAM.

Try adding some logging of memory usage to your code and see how high it was close to the failure point.

Hi Rodrigo, thanks for the suggestion! I checked it using nvidia smi and it shows the GPU is not highly used because the model I used is pretty small (one model training usually takes 200 MB of GPU RAM).

I’ve seen implementation that uses a shared optimizer class, but not sure if that makes a difference, maybe I will try that, thanks!

I’m able to solve this problem by using shared optimizer class as implemented in Morvan Zhou’s A3C implementation on Github