Hey @Brando_Miranda,
I have a very similar if not the same issue (difficult to say). Have you found a solution to this problem? In my case this issue also occurs rather infrequently. Running on the same server (same GPUs, environment, etc.) training my model sometimes is successful and sometimes ends with SIGSEGV
.
Cheers
Edit: If it is of any help, I posted my code here.