Cuda runtime error (59) while iterating on dataloader

When I run
data,label = next(iter(training_loader))

sometimes it is OK, but sometimes I get the following error:

---> 73         return iter(torch.randperm(n).tolist())
     74 
     75     def __len__(self):

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/generic/THCTensorMath.cu:14

I set os.environ['CUDA_LAUNCH_BLOCKING'] = '1'. I read posts like this one, but the solution there does not solve the problem. What does the error message mean? Thanks.

The entire error message is:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-93-97a6deb1fb6c> in <module>
      1 # scrath code to examine a batch from the data loader
----> 2 data,label = next(iter(training_loader))
      3 print(data.size(), ' [batch size, number of channels(I/Q), number of samples per waveform]')

~/.conda/envs/fa0.4/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __iter__(self)
    817 
    818     def __iter__(self):
--> 819         return _DataLoaderIter(self)
    820 
    821     def __len__(self):

~/.conda/envs/fa0.4/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __init__(self, loader)
    582             # prime the prefetch loop
    583             for _ in range(2 * self.num_workers):
--> 584                 self._put_indices()
    585 
    586     def __len__(self):

~/.conda/envs/fa0.4/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _put_indices(self)
    644     def _put_indices(self):
    645         assert self.batches_outstanding < 2 * self.num_workers
--> 646         indices = next(self.sample_iter, None)
    647         if indices is None:
    648             return

~/.conda/envs/fa0.4/lib/python3.6/site-packages/torch/utils/data/sampler.py in __iter__(self)
    158     def __iter__(self):
    159         batch = []
--> 160         for idx in self.sampler:
    161             batch.append(idx)
    162             if len(batch) == self.batch_size:

~/.conda/envs/fa0.4/lib/python3.6/site-packages/torch/utils/data/sampler.py in __iter__(self)
     71         if self.replacement:
     72             return iter(torch.randint(high=n, size=(self.num_samples,), dtype=torch.int64).tolist())
---> 73         return iter(torch.randperm(n).tolist())
     74 
     75     def __len__(self):

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1544174967633/work/aten/src/THC/generic/THCTensorMath.cu:14

It turns out the indices were out of bound, although the error message seemed to point to something else. An assert statement should prevent this from happening.