RuntimeError: CUDA error: misaligned address Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

Hi, I’m having the issue. I’m trying to use multiple webcams using different threads it works for about 1 minute without lag and with detections on GPU but then crashes with the exact same error. I then have to restart the entire development environment to run it again as it won’t let me run without a restart of the application.
“RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.”

My assumptions were that it could be a memory issue but when running and monitoring the GPU and CPU performance and ram everything seems fine. I’ve read a bit online and the same error has crept up when using multiple cams in terms of classes it expects a certain value but receives double the amount of classes which throws the error. That’s my assumption but any help on this issue would be greatly appreciated.

this is the code of my threading with yolo prediction method:
class camThread(threading.Thread):
def init(self, camID):
#self.previewName = previewName
self.camID = camID
def run(self):
#print("Starting " + self.previewName)

def camPreview(camID):

infer= YOLO("")
results=infer.predict(source=camID, verbose=False,show=True)
#with open("output.txt", "w") as fo:
  #for r in results:

  thread1 = camThread(0)
  thread2 = camThread(1)
  print("Active threads", threading.activeCount())

Could you run your workload via compute-sanitizer python args and see if any issues are reported? Also, are you able to reproduce the issue without any cam usage in pure PyTorch?

@ptrblck, many thanks for your quick response. This issue was related to memory, so it is not directly relevant to pytorch.