Hello all,

I have the following issue:

I have a function that takes as input a pretrained model (eg. GAN) and another vector y, let’s say

f(y, G(z)) .

I want to compute the gradient of this function w.r.t. to z_i for different (and many) z_i’s. Let us denote this gradient as:

dfdz = []

for i in range(N):

dfdz.append(autorgrad.grad(f(y,G(z_i)),z_i, create_graph= True)[0])

That gradient is a function of y and the jacobian of G(z_i). Then, all the gradients dfdz = [dfdz[1] dfdz[2],…,dfdz[N] ] are given as input to another function g(dfdz,x) and I want to get the gradient of g(dfdz,x) w.r.t. y (let’s denote it as dgdy).

The issue is that when I use create_graph = True, which is needed for being able the gradient of g(dfdz,x), the CUDA memory blows up. I have checked the computation graph and it seems that for each z_i Pytorch saves in memory the computation graph of G(z_i) which is huge.

However, I don’t need the compuation graph for G(z_i) which is a pretrained model. In fact, I only care about its gradients w.r.t. to z (Jacobians).

Is there any way to delete from memory that part of the graph ?

If not, is there any alternative and efficient way to perform the same computations without setting create_graph to True?

Thank you !