RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorCopy.c:21


(Akash Singh) #1

Hello, Getting this error in the middle of the training. I have tried training several times but the same error comes at different steps.

/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [5,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [5,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
.
.
.

/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [6,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorCopy.c line=21 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "train.py", line 40, in <module>
    main(opt)
  File "train.py", line 27, in main
    single_main(opt)
  File "/home/aki/OpenNMT-py/onmt/train_single.py", line 262, in main
    opt.valid_steps)
  File "/home/aki/OpenNMT-py/onmt/trainer.py", line 223, in train
    report_stats)
  File "/home/aki/OpenNMT-py/onmt/trainer.py", line 384, in _gradient_accumulation
    dec_state)
  File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aki/OpenNMT-py/onmt/models/model.py", line 75, in forward
    memory_lengths=lengths)
  File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aki/OpenNMT-py/onmt/decoders/decoder.py", line 139, in forward
    tgt, memory_bank, state, memory_lengths=memory_lengths)
  File "/home/aki/OpenNMT-py/onmt/decoders/decoder.py", line 350, in _run_forward_pass
    memory_lengths=memory_lengths)
  File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/aki/OpenNMT-py/onmt/modules/global_attention.py", line 181, in forward
    mask = sequence_mask(memory_lengths, max_len=align.size(-1))
  File "/home/aki/OpenNMT-py/onmt/utils/misc.py", line 23, in sequence_mask
    .type_as(lengths)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorCopy.c:21


(Chris) #2

So far I got this error for two reasons:

  • In case of classification, my labels where not in [0...(label_size-1)]
  • My input sequence contained indexes there where not in the embedding layer.

Since you don’t say what you’re doing or show any code, one can only guess. Does it work on CPU? And if not, what’s the error there?


(Akash Singh) #3

I am working on a multilingual nmt model using opennmt. Getting the following error when running on cpu. As you said its index out of range error.

Traceback (most recent call last):

  File "train.py", line 40, in <module>

    main(opt)

  File "train.py", line 27, in main

    single_main(opt)

  File "/home/gamut/OpenNMT-py/onmt/train_single.py", line 262, in main

    opt.valid_steps)

  File "/home/gamut/OpenNMT-py/onmt/trainer.py", line 223, in train

    report_stats)

  File "/home/gamut/OpenNMT-py/onmt/trainer.py", line 384, in _gradient_accumulation

    dec_state)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/OpenNMT-py/onmt/models/model.py", line 61, in forward

    enc_final, memory_bank = encoder(src, lengths)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/OpenNMT-py/onmt/encoders/rnn_encoder.py", line 57, in forward

    emb = self.embeddings(src)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/OpenNMT-py/onmt/modules/embeddings.py", line 205, in forward

    source = self.make_embedding(source)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward

    input = module(input)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/OpenNMT-py/onmt/modules/util_class.py", line 43, in forward

    outputs = [f(x) for f, x in zip(self, inputs_)]

  File "/home/gamut/OpenNMT-py/onmt/modules/util_class.py", line 43, in <listcomp>

    outputs = [f(x) for f, x in zip(self, inputs_)]

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__

    result = self.forward(*input, **kwargs)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 108, in forward

    self.norm_type, self.scale_grad_by_freq, self.sparse)

  File "/home/gamut/anaconda3/envs/zeroshotpy/lib/python3.6/site-packages/torch/nn/functional.py", line 1076, in embedding

    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: index out of range at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/TH/generic/THTensorMath.c:343

(Chris) #4

I best guess would be that your input sequences contain indexes that not match with the embedding layer. The embedding layer has a shape of (vocabulary_size, embedding_dim), e.g., (20000, 300). This means your input sequences can only contain values in the range [0..19999]. You might want to check if any value in your input sequences is 20000 or higher.