Does tensor.register_post_accumulate_grad_hook() always fire once, or multiple times?

I’m looking at pytorch/torch/_tensor.py at v2.6.0 · pytorch/pytorch · GitHub

Is this a guarantee that the hook will only be called once per parameter per backward pass? Even if I’m training something like an RNN where the grad is going to be accumulated multiple times? Or is that not what it’s saying? I’m not able to understand it from the docs or blog post.

My specific use case is an optimizer that calls register_post_accumulate_grad_hook on the parameters you pass into it. I’m using fsdp2’s CPUOffloadPolicy, so they’re already on the CPU. Then I get the tensor data, do the optimizer step in C++,

This is all implemented here:

Emperically it works, but optimizers are probably tolerant to this sort of thing, and I’d like to gain a better understanding of what the autograd engine does before it fires the hooks. If it’s “Ah, I have no more references to this tensor in my graph, let’s fire the hook” or if it’s more naive.

(Already answered elsewhere, but for posterity)
Yes, the post accumulate grad hook should only fire once if a single backward call is made. If a tensor is used multiple times during forward, accumulation happens to a temporary buffer before it is applied to the .grad field, which may involve another accumulation if the .grad field is already populated.