Hello, Getting this error in the middle of the training. I have tried training several times but the same error comes at different steps.
/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [5,0,0], thread: [96,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [5,0,0], thread: [97,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
.
.
.
/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/THCTensorIndex.cu:360: void indexSelectLargeIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2, IndexIsMajor = true]: block: [6,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorCopy.c line=21 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train.py", line 40, in <module>
main(opt)
File "train.py", line 27, in main
single_main(opt)
File "/home/aki/OpenNMT-py/onmt/train_single.py", line 262, in main
opt.valid_steps)
File "/home/aki/OpenNMT-py/onmt/trainer.py", line 223, in train
report_stats)
File "/home/aki/OpenNMT-py/onmt/trainer.py", line 384, in _gradient_accumulation
dec_state)
File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/aki/OpenNMT-py/onmt/models/model.py", line 75, in forward
memory_lengths=lengths)
File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/aki/OpenNMT-py/onmt/decoders/decoder.py", line 139, in forward
tgt, memory_bank, state, memory_lengths=memory_lengths)
File "/home/aki/OpenNMT-py/onmt/decoders/decoder.py", line 350, in _run_forward_pass
memory_lengths=memory_lengths)
File "/home/aki/anaconda3/envs/zeroshot/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
result = self.forward(*input, **kwargs)
File "/home/aki/OpenNMT-py/onmt/modules/global_attention.py", line 181, in forward
mask = sequence_mask(memory_lengths, max_len=align.size(-1))
File "/home/aki/OpenNMT-py/onmt/utils/misc.py", line 23, in sequence_mask
.type_as(lengths)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorCopy.c:21