GPU will crash when run multi threads by libtorch

Hi, i have same question in libtorch, i trace bert model to libtorch and run it with multi thread on GPU. after infering some images, program will freeze and not print anythings in console, but if i run on CPU , it’s fine, and if i use single thread to run model on GPU, it’s also no problem.
this question only appear in Tesla P40, and on Tesla K80 everything is ok.

my environments is CUDA9.0 + cuDNN7.5 + GCC 4.9.4 + libtorch 1.1.0.

it’s screenshot when program has no progress
nvidia

run