Hi, I am trying to do some hack to pytorch to see if an research idea works.
Specificly, I want to remove the computation(i.e. cuda kernel launch) in both forward and backward pass, without breaking the training script interface. Ideally, after remove the computation kernels, I can still use pytorch train.py
to do a training loop and see how much time it takes. (though the training itself no longer make senses here since no meaningful activation and gradient is computed)
I check the document and it seems that there is no obvious way to do that. From my understanding: In the forward pass, I need to keep the memory allocation of those saved tensor
as weel as the computation graph. In the backward pass, I need to keep the memory allocation for the gradient.
I believe that modifying aTen
and autograd
library might work, but it will be very difficult. I wonder if there is any easier solution. For example, can I change the grad_fn
after forward to make it a dummy that allocs and returns torch.zeros()
?
Thanks in advance!