Training fails after a certain number of epochs

RuntimeError: merge_sort: failed to synchronize: an illegal memory access was encountered

Pytorch 0.4.1

Proprietary code, so can’t share. Any help would be appreciated.

Pankesh