Model predicting in multi thread hangs gpu

I want to serve the model in multi thread, but it hangs the gpu and there is no error log.

Is the model’s forwarding thread-safe ?

This is my envs:

PyTorch version: 1.0.1.post2
Is debug build: No
CUDA used to build PyTorch: 10.0.130

OS: Ubuntu 16.04.5 LTS
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
CMake version: version 3.5.1

Python version: 3.5
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: GeForce RTX 2080
Nvidia driver version: 410.79
cuDNN version: /usr/lib/x86_64-linux-gnu/

and the sample code :

from concurrent.futures import ThreadPoolExecutor
from itertools import repeat
import numpy as np

import torch
from torchvision import models

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

def task(model):
  num_tasks = 100

  for i in range(num_tasks):
    input_tensor = torch.ones([16, 3, 224, 224], dtype=torch.float32)
    with torch.no_grad():
      input_tensor =

vgg16 = models.vgg16(pretrained=False)
num_workers = 16
with ThreadPoolExecutor(max_workers=16) as pool:
  results =, repeat(vgg16, num_workers))

and nvidia-smi's output is:


I don’t think you should do that unless you have multiple GPUs. This might be useful.

My guess is that you are creating as much CUDA contexts as the number of CPU threads, so (i) you allocate too much memory and (ii) the contexts don’t run concurrently.

Me too here, with pytorch1.0.0 and CUDA10 and two GTX2080ti

Is it because of the multi-threaded data loading to GPU?
It seems the process of moving data to GPU should not be done asynchronous. Hope this can help :slight_smile:.