loss.backward() deadlocks for me quite frequently (using CPU, no distributed mode). Unfortunately, I don’t have root / access to gdb. Is there any way to still debug / trace this?

Is it possible to enable any tracing / logging of autograd backward?

You could try to set export TORCH_SHOW_CPP_STACKTRACES=1, run your script until it hangs, and kill it e.g. via SIGHUP. You might be able to see the stacktraces in the terminal which could point to the hanging line of code. I haven’t tried this approach as gdb is available in my setup, but it might be worth a try.