Cuda runtime error (3) : initialization error - with from_numpy and dataset

mxlei01 · January 3, 2018, 10:55pm

Hi All,

I’m trying to use from_numpy inside a Dataset, and it seems to break PyTorch.

Minimum example:

import torch
import torch.utils.data as data
import numpy as np
import random

random.seed(0)
torch.manual_seed(0)
from torch.utils.data.dataset import Dataset
random_seed = 0
random.seed(random_seed)
if torch.cuda.is_available():
    print("GPU Acceleration Available")
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    dtype = torch.cuda.FloatTensor
    torch.cuda.manual_seed_all(0)
    pin_memory = True
else:
    dtype = torch.FloatTensor
    pin_memory = False
minibatch_size = 2
num_workers=1

class LineDataset(Dataset):
    def __init__(self):
        pass
            
    def __getitem__(self, idx):
        a = torch.from_numpy(np.array([1, 2, 3, 4]))
        a.cuda()
        
    def __len__(self):
        return 10
    
for pack, label in data.DataLoader(LineDataset(), batch_size=minibatch_size, shuffle=True, num_workers=num_workers):
    break

Output:

CUDA_LAUNCH_BLOCKING=1 python3 Untitled.py
GPU Acceleration Available
THCudaCheck FAIL file=/pytorch/torch/lib/THC/THCGeneral.c line=74 error=3 : initialization error
Traceback (most recent call last):
File “Untitled.py”, line 34, in
for pack, label in data.DataLoader(LineDataset(), batch_size=minibatch_size, shuffle=True, num_workers=num_workers):
File “/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 210, in next
return self._process_next_batch(batch)
File “/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 230, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File “/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 42, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py”, line 42, in
samples = collate_fn([dataset[i] for i in batch_indices])
File “Untitled.py”, line 29, in getitem
a.cuda()
File “/usr/local/lib/python3.6/site-packages/torch/_utils.py”, line 69, in cuda
return new_type(self.size()).copy(self, async)
File “/usr/local/lib/python3.6/site-packages/torch/cuda/init.py”, line 358, in _lazy_new
_lazy_init()
File “/usr/local/lib/python3.6/site-packages/torch/cuda/init.py”, line 121, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (3) : initialization error at /pytorch/torch/lib/THC/THCGeneral.c:74

If I call from_numpy out of dataset, and run .cuda() it does not encounter cuda runtime error.

Does not error out of use CPU only.

mxlei01 · January 4, 2018, 6:20am

Answering my own question:

dtype was a cuda type, and I think what was happening was multiple CUDA tensors was created in the dataset, and pytorch does not support this kind of behavior.

Switching it back to torch.FloatTensor, or not calling cuda type tensors will not cause the error.

smth · January 4, 2018, 6:31am

the solution to your issue should be:

mxlei01 · January 4, 2018, 6:46am

Thanks for your reply!

I actually tried changing it spawn, but it just freezes forever, I’m not sure why.

Changed it to CPU computation for datasets works perfectly, and nn.modules still runs on GPU.

mxlei01 · January 4, 2018, 7:37am

I think the issue with the freezing is possibly related to:

I’m using numpy and linecache, and it is possible that one of the library is causing the dataloader to freeze.

Since the output of the DataLoader is now a FloatTensor instead of cuda.FloatTensor, I wrote a script using to convert the output from CPU’s dataloader output to GPU RNN nn.module:

def convert_pack_to_cuda(pack, label):
    if torch.cuda.is_available():
        unpack, lengths = torch.nn.utils.rnn.pad_packed_sequence(pack, batch_first=True)
        unpack = unpack.type(dtype)
        pack = torch.nn.utils.rnn.pack_padded_sequence(unpack, lengths, batch_first=True)
        label = label.type(dtype)
    return pack, label

And it seems to work in harmony.