Hello! I have this network:
class Rocket_E_NN(nn.Module):
def __init__(self):
super().__init__()
softplus_value = 5
self.encoder = nn.Sequential(
nn.Conv2d(3, 32, 4, 2, 1), # B, 32, 32, 32
modifiedSoftplus(softplus_value),
nn.Conv2d(32, 32, 4, 2, 1), # B, 32, 16, 16
modifiedSoftplus(softplus_value),
nn.Conv2d(32, 64, 4, 2, 1), # B, 64, 8, 8
modifiedSoftplus(softplus_value),
nn.Conv2d(64, 64, 4, 2, 1), # B, 64, 4, 4
modifiedSoftplus(softplus_value),
nn.Conv2d(64, 256, 4, 1), # B, 256, 1, 1
modifiedSoftplus(softplus_value),
View((-1, 256*1*1)), # B, 256
nn.Linear(256, 2), # B, 1
)
def forward(self, x):
z = self.encoder(x)
return z
class Rocket_D_NN(nn.Module):
def __init__(self):
super().__init__()
softplus_value = 5
self.decoder = nn.Sequential(
nn.Linear(2, 256), # B, 256
View((-1, 256, 1, 1)), # B, 256, 1, 1
modifiedSoftplus(softplus_value),
nn.ConvTranspose2d(256, 64, 4), # B, 64, 4, 4
modifiedSoftplus(softplus_value),
nn.ConvTranspose2d(64, 64, 4, 2, 1), # B, 64, 8, 8
modifiedSoftplus(softplus_value),
nn.ConvTranspose2d(64, 32, 4, 2, 1), # B, 32, 16, 16
modifiedSoftplus(softplus_value),
nn.ConvTranspose2d(32, 32, 4, 2, 1), # B, 32, 32, 32
modifiedSoftplus(softplus_value),
nn.ConvTranspose2d(32, 3, 4, 2, 1), # B, 3, 64, 64
)
def forward(self, z):
x = self.decoder(z)
return x
So It is an encoder which picks an image, reduces it to a bottleneck of size 2 and then it reconstructs the original image. This works very well and the reconstructed image looks ok. Now I want to calculate the Jacobian of the output image with respect to the bottleneck layer. So the Jacobian will be 3x64x64x2 numbers. Here is my code for that:
def jacobian(inputs, outputs):
return torch.stack([torch.autograd.grad([outputs[:, i].sum()], [inputs], create_graph=True)[0]
for i in range(outputs.size(1))], dim=-1)
z = model_E(x_0)
output = model_D(z)
output = output.view(otp.size(0),otp.size(1)*otp.size(2)*otp.size(3))
J = jacobian(z,output)
So I tested the code above on a much smaller example and it does what I want. My problem is that when I try it on this problem I get this error: CUDA out of memory. Tried to allocate 8.00 MiB (GPU 0; 7.94 GiB total capacity; 7.54 GiB already allocated; 4.94 MiB free; 3.01 MiB cached)
So I am a bit confused. First of all, why do I run out of memory when computing the Jacobian, as the gradients needed for computing the Jacobian are also computed during backprop when I train the network. So why does it work for backprob, but not here? Then, based on the error message I have 7.94 GiB and 7.54 GiB are used. So I would have 400 MiB free, while I only need 8 MiB. Am I missunderstanding the message? And lastly, can someone please help me with this? Thank you so much!