Training Job Stalls with no Logs & GPU Usage Spike

Yes. Found a non-detached tensor getting accumulated.