You can get per-sample gradients using the torch.func API (see this), and you can use the torch.autograd.functional API to compute hvp (or vhp - apparently it’s more efficient). So maybe you can try to combine both, instead of relying on Opacus?
Could you also clarify what you want to do, mathematically? In particular:
- What vector do you want to use for the hvp? (I’m guessing that the
torch.randin the code you shared is just for the sake of the example). - What do you want to do with the per-sample gradients?