I have to generate a lot of randomized batches. One thing I can’t do is pre-storing all the data on the GPU (that would take too much space). So right now I’m moving the batch from the CPU to the GPU in the training loop.
I’d like to speed things up by pre-moving the data to the GPU in the batch worker (s.t. I can directly use it in the training loop).
Here’s the MWE that illustrates my plan:
import torch
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
import numpy as np
class ExampleDataset(Dataset):
def __init__(self):
super(Dataset, self).__init__()
def __len__(self):
return 100000
def __getitem__(self, idx):
return np.random.rand(3)
def custom_collate_fn(batch):
batch = torch.tensor(batch)
# i'd like to pre-move the data to the GPU but i get an error here:
batch = batch.to('cuda', non_blocking=True)
return batch
batch_loader = DataLoader(
ExampleDataset(),
batch_size=100,
shuffle=True,
num_workers=8,
pin_memory=True,
collate_fn=custom_collate_fn
)
# training loop
NUM_EPOCHS = 10
for epoch in range(NUM_EPOCHS):
for batch_num, train_batch in enumerate(batch_loader, 0):
# usually i transfer the train_batch from CPU to GPU here
# that causes delays (i have huge batch-sizes)
print('training.')
Here’s the error that I get:
/local/home/venv/bin/python -u /local/home/BT/mwe.py
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
Traceback (most recent call last):
File "/local/home/BT/mwe.py", line 39, in <module>
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=55 error=3 : initialization error
for batch_num, train_batch in enumerate(batch_loader, 0):
File "/local/home/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
return self._process_next_batch(batch)
File "/local/home/venv/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/local/home/venv/lib/python3.5/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/local/home/BT/mwe.py", line 22, in custom_collate_fn
batch = batch.to('cuda', non_blocking=True)
File "/local/home/venv/lib/python3.5/site-packages/torch/cuda/__init__.py", line 163, in _lazy_init
torch._C._cuda_init()
RuntimeError: cuda runtime error (3) : initialization error at /pytorch/aten/src/THC/THCGeneral.cpp:55
Process finished with exit code 1
Do you know how I could achieve my plan? Unfortunately I can’t get rid of this error (I’ve tried setting torch.multiprocessing modes, but that didn’t help).