Hi Team,
I’m using fairseq for NMT task. When I run my code for smaller dataset it ran perfectly but when I increase my dataset by 10 times. I’m getting the following error.
pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
File "fairseq/train.py", line 352, in <module>
multiprocessing_main(args)
File "fairseq/multiprocessing_train.py", line 40, in main
p.join()
File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
res = self._popen.wait(timeout)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "fairseq/multiprocessing_train.py", line 82, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --
Traceback (most recent call last):
File "fairseq/multiprocessing_train.py", line 46, in run
single_process_main(args)
File "fairseq/train.py", line 87, in main
train(args, trainer, task, epoch_itr)
File "fairseq/train.py", line 125, in train
log_output = trainer.train_step(sample, update_params=True)
File "fairseq/fairseq/trainer.py", line 117, in train_step
loss, sample_size, logging_output, oom_fwd = self._forward(sample)
File "fairseq/fairseq/trainer.py", line 205, in _forward
raise e
File "fairseq/fairseq/trainer.py", line 197, in _forward
loss, sample_size, logging_output_ = self.task.get_loss(self.model, self.criterion, sample)
File "fairseq/fairseq/tasks/fairseq_task.py", line 49, in get_loss
return criterion(model, sample)
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
net_output = model(**sample['net_input'])
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/models/fairseq_model.py", line 146, in forward
auxencoder_out = self.auxencoder(ctx_tokens, ctx_lengths)
File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 193, in forward
if not encoder_padding_mask.any():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317
As mentioned in few git issues and PyTorch forums questions, I have run my code using CUDA_LAUNCH_BLOCKING=1
and the following is my error log
Traceback (most recent call last):
File "/fairseq/train.py", line 352, in <module>
multiprocessing_main(args)
File "/fairseq/multiprocessing_train.py", line 40, in main
p.join()
File "/opt/anaconda/lib/python3.7/multiprocessing/process.py", line 140, in join
res = self._popen.wait(timeout)
File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
return self.poll(os.WNOHANG if timeout == 0.0 else 0)
File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
pid, sts = os.waitpid(self.pid, flag)
File "/fairseq/multiprocessing_train.py", line 82, in signal_handler
raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --
Traceback (most recent call last):
File "/fairseq/multiprocessing_train.py", line 46, in run
single_process_main(args)
File "/fairseq/train.py", line 35, in main
load_dataset_splits(args, task, ['train', 'valid'])
File "/fairseq/train.py", line 333, in load_dataset_splits
task.load_dataset(split_k)
File "/fairseq/fairseq/tasks/translation_ctx.py", line 105, in load_dataset
ctx_dataset = indexed_dataset(prefix + 'ctx', self.ctx_dict)
File "/fairseq/fairseq/tasks/translation_ctx.py", line 98, in indexed_dataset
return IndexedRawTextDataset(path, dictionary)
File "/fairseq/fairseq/data/indexed_dataset.py", line 130, in __init__
self.read_data(path, dictionary)
File "/fairseq/fairseq/data/indexed_dataset.py", line 136, in read_data
self.lines.append(line.strip('\n'))
MemoryError
According to me if there is memory constrain then CUDA should throw out of memory error and not this error. Based on the reading, I came to know cuda runtime error (59) : device-side asset
error triggered due to out-of-bound issue or due to faulty loss function. This shouldn’t be a case here because the entire code is running smoothly for the smaller dataset and failing to process large datasets. Hence, I’m putting this question here.
Is there anything that I need to check in order to resolve this issue or something am I missing?
Any help, support, and direction is highly appreciable
Same point also raised here on this PyTorch issue thread
Environment:
Ubuntu 18.04 (8 GPU)
fairseq==0.5
torch==0.4.1
torchvision==0.3.0
Taking reference form this GitHub code
Thanks