I’m having a checkpointed model where the parts between the checkpoints can be run in place, but, as well all know, in place operations do not play well with the autograd.
However, the first forward of the checkpointed model could be run in place, but as far as I know there’s no way of telling the checkpointing to run different things during the first forward and the backward-forward.
In order to work around this, I’m thinking of just creating a large custom autograd function which does the manual gradient checkpointing in the backward. Though I’m not particularly keen on reimplementing the backward of built-in functions such as gelus my self.
Is there a way of accessing for example gelu_backward directly without having to do a forward pass?
Seemingly I could create a c++ extension and access gelu_backward form Aten directly but that is a bit inelegant.