I am using v1.0.1/CUDA 10.1 on a 2080. glibc is 2.28.
I read that libtorch is multithread safe but I cannot make it work correctly in a MT C++ app. I have 2 trainers each one in its own thread, no global var, no static var, no shared var… but it always ends in a deadlock.
When deadlocked both threads are near cuLaunchKernel. Is there something I need to do ? Do I have to initialize a context or something?
I also tested with master and a nightly of libtorch but it ends the same.
Great! it seems that PoolWindow makes it work correctly.
I run into a problem where available_handles (global) was destructed before PoolWindow (thread_local) (Valgrind told me).
available_handles[d_h.first].push_back(d_h.second); // SIGSEGV
I dont know how this can happen, I will try to reproduce.
@pascal.soveaux https://github.com/pytorch/pytorch/issues/19394 is likely related. Would you like to comment in the issue about your findings? Thanks!
@yf225 Sorry for the late response. I found that this problem is totally not related with poolwindow. The thread is exited after the “main” (libtorch) thread because both are worker threads in fact.