PyTorch Performance Profiling

What’s the standard tool for PyTorch profiling these days?
Is there anything that would also work with torch.multiprocessing?
Setting up timers and counters of one’s own at different parts of code doesn’t seem the most effective way to do this and spend one’s time?
(TB’s profiling probably has hooks for this but would only work with TF.)

I would suggest the builtin profiler:

The support for distributed backward is work in progress though. But the rest should be fairly stable.

I like it for its simplicity, and I’ll give it a try.

Hopefully, more documentation . . . is on the way . . . and the distributed release, too . . .

(Is the latter really always necessary when the user can already directly use it in multiprocessing’s process “target”'s ? Or is that not allowed in torch.multiprocessing? I guess, I will find out, and if I learn anything, I’ll try to remember to post it here.)