I am working on the MaskRCNN Pytorch inbuilt model. When I trying to access the loss value that I got output from the model. I am getting following error:
Traceback (most recent call last):
File "/home/droneslab/projects/plastics/shri/DRONES_Lab/trainer-py", line 78, in _one_epoch_train batch_loss_train.append(total loss.item())
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might
be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with *TORCH_USE_CUDA_DSA* to enable device-side assertion
It’s not like I am getting this error every epoch. I am getting intermittent. Making me unable to log metrics by both custom logging or tensorboard.