I appear to be getting a device side assert from an embedding lookup, but (with CUDA_LAUNCH_BLOCKING enabled) the exception is raised for the next instruction that touches CUDA (in my case, it’s a call to torch.zeros(..., device='cuda')
), several python instructions later.
entailed_embeds_calculated = [self.relation_embedding(index_tensor) for index_tensor in entailed_pred_indices]
# here self.relation_embedding is an nn.Embedding, and entailed_pred_indices is a list of index tensors.
# I checked, and an element of an index_tensor is out of bounds.
entailed_embeds_calculated = self.aggregator(entailed_embeds_calculated, entailed_scores)
# entailed_scores is also a list of tensors
...
def aggregator(embedding_lists, weights=None):
# a few asserts and other non-CUDA python code
embed_dim = embedding_lists[0].shape[-1]
zero_tensor = torch.zeros((embed_dim,), dtype=torch.float, device=embedding_lists[0].device)
# At this point the "Device side assert" is raised.
A similar behaviour appears to be happening here.
The CUDA error looks like this:
/opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/THCTensorIndex.cu:307: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [11,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
File ".../RelationScorers.py", line 1072, in aggregator
zero_tensor = torch.zeros((embed_dim,), dtype=torch.float, device=embedding_lists[0].device)
RuntimeError: CUDA error: device-side assert triggered
...