RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317

Hi Team,

I’m using fairseq for NMT task. When I run my code for smaller dataset it ran perfectly but when I increase my dataset by 10 times. I’m getting the following error.

pytorch/aten/src/THC/THCTensorIndex.cu:362: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [37,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCReduceAll.cuh line=317 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "fairseq/train.py", line 352, in <module>
    multiprocessing_main(args)
  File "fairseq/multiprocessing_train.py", line 40, in main
    p.join()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 124, in join
    res = self._popen.wait(timeout)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 50, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/usr/lib/python3.6/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "fairseq/multiprocessing_train.py", line 82, in signal_handler
    raise Exception(msg)
Exception:

-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "fairseq/multiprocessing_train.py", line 46, in run
    single_process_main(args)
  File "fairseq/train.py", line 87, in main
    train(args, trainer, task, epoch_itr)
  File "fairseq/train.py", line 125, in train
    log_output = trainer.train_step(sample, update_params=True)
  File "fairseq/fairseq/trainer.py", line 117, in train_step
    loss, sample_size, logging_output, oom_fwd = self._forward(sample)
  File "fairseq/fairseq/trainer.py", line 205, in _forward
    raise e
  File "fairseq/fairseq/trainer.py", line 197, in _forward
    loss, sample_size, logging_output_ = self.task.get_loss(self.model, self.criterion, sample)
  File "fairseq/fairseq/tasks/fairseq_task.py", line 49, in get_loss
    return criterion(model, sample)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/criterions/label_smoothed_cross_entropy.py", line 36, in forward
    net_output = model(**sample['net_input'])
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/models/fairseq_model.py", line 146, in forward
    auxencoder_out = self.auxencoder(ctx_tokens, ctx_lengths)
  File "python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "fairseq/fairseq/models/fconv_dualenc_gec_gatedaux.py", line 193, in forward
    if not encoder_padding_mask.any():
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:317

As mentioned in few git issues and PyTorch forums questions, I have run my code using CUDA_LAUNCH_BLOCKING=1 and the following is my error log

Traceback (most recent call last):                                                                                                                      
File "/fairseq/train.py", line 352, in <module>
    multiprocessing_main(args)
  File "/fairseq/multiprocessing_train.py", line 40, in main
    p.join()
  File "/opt/anaconda/lib/python3.7/multiprocessing/process.py", line 140, in join
    res = self._popen.wait(timeout)
  File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 48, in wait
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/opt/anaconda/lib/python3.7/multiprocessing/popen_fork.py", line 28, in poll
    pid, sts = os.waitpid(self.pid, flag)
  File "/fairseq/multiprocessing_train.py", line 82, in signal_handler
    raise Exception(msg)
Exception:
-- Tracebacks above this line can probably be ignored --

Traceback (most recent call last):
  File "/fairseq/multiprocessing_train.py", line 46, in run
    single_process_main(args)
  File "/fairseq/train.py", line 35, in main
    load_dataset_splits(args, task, ['train', 'valid'])
  File "/fairseq/train.py", line 333, in load_dataset_splits
    task.load_dataset(split_k)
  File "/fairseq/fairseq/tasks/translation_ctx.py", line 105, in load_dataset
    ctx_dataset = indexed_dataset(prefix + 'ctx', self.ctx_dict)
  File "/fairseq/fairseq/tasks/translation_ctx.py", line 98, in indexed_dataset
    return IndexedRawTextDataset(path, dictionary)
  File "/fairseq/fairseq/data/indexed_dataset.py", line 130, in __init__
    self.read_data(path, dictionary)
  File "/fairseq/fairseq/data/indexed_dataset.py", line 136, in read_data
    self.lines.append(line.strip('\n'))
MemoryError

According to me if there is memory constrain then CUDA should throw out of memory error and not this error. Based on the reading, I came to know cuda runtime error (59) : device-side asset error triggered due to out-of-bound issue or due to faulty loss function. This shouldn’t be a case here because the entire code is running smoothly for the smaller dataset and failing to process large datasets. Hence, I’m putting this question here.

Is there anything that I need to check in order to resolve this issue or something am I missing?

Any help, support, and direction is highly appreciable

Same point also raised here on this PyTorch issue thread
Environment:
Ubuntu 18.04 (8 GPU)
fairseq==0.5
torch==0.4.1
torchvision==0.3.0
Taking reference form this GitHub code

Thanks

Based on the first error message, if seems that an indexing is failing:

Assertion `srcIndex < srcSelectDimSize` failed

You could either check the tensors for the failing operation (might be in self.auxencoder(ctx_tokens, ctx_lengths)) or run the code on the CPU to get a better error message.

Thank you @ptrblck for your reply.

After getting a response from Fairseq team they suggested me to use the preprocessing module because --raw text and memory indexing code couldn’t scale up for larger dataset. I have used the preprocessing module of fairseq and it seems working now.

Thank you for helping me

Thanks for the update.
Could you post a link to this discussion (if it’s publicly available), please?
While some approaches might not scale for larger datasets, they should at least not run into these assert statements, so I would like to take a look at it.

Hi @ptrblck,

This is the link to the Fairseq GitHub issue.

As a next step, I will be doing benchmarking of my code using the preprocessing module vs raw text on small dataset such that I can see if there will be any impact on the accuracy or not. I will provide the update soon here. This may help the community.

Thanks

1 Like