Remove Computation in Forward & Backward Pass

Hi, I am trying to do some hack to pytorch to see if an research idea works.

Specificly, I want to remove the computation(i.e. cuda kernel launch) in both forward and backward pass, without breaking the training script interface. Ideally, after remove the computation kernels, I can still use pytorch train.py to do a training loop and see how much time it takes. (though the training itself no longer make senses here since no meaningful activation and gradient is computed)

I check the document and it seems that there is no obvious way to do that. From my understanding: In the forward pass, I need to keep the memory allocation of those saved tensor as weel as the computation graph. In the backward pass, I need to keep the memory allocation for the gradient.

I believe that modifying aTen and autograd library might work, but it will be very difficult. I wonder if there is any easier solution. For example, can I change the grad_fn after forward to make it a dummy that allocs and returns torch.zeros() ?

Thanks in advance!