I have a CUDA kernel implementation of an autograd Function Func with custom backward pass which I use as follows during forward pass:
x_1 = from data loader
for t = 1 : N
x_(t + 1) = Func(x_t)
Where N can very large e.g. 100 and so to decrease GPU memory usage, Func performs operations on x in-place and sets the mark_dirty flag on it. During backward pass I would like to recompute the computation graph by basically trading compute for memory. Would I need to use checkpointing to achieve this goal or does setting mark_dirty flag on a Tensor automatically will recompute the input? Or said another way how does mark_dirty recomputes the lost Tensor, does it keep a copy in memory or computes it?
This is the right way to let the autograd know about the inplace.
That’s actually the only think mark_dirty is doing: let the autograd know that an inplace op happened.
Sorry I was not clear earlier. How would autograd reconstruct original x for backprop, if it was sent to Func during forward pass and thus was overwritten?