Train a generator: trying to backward through the graph a second time

imxtx · May 9, 2022, 1:13pm

I want to train a generator to simulate the distribution of the output of a ResNet18. But pytorch gives an error in the second iteration of the for loop:

RuntimeError: Trying to backward through the graph a second time (or directly access saved 
tensors after they have already been freed). Saved intermediate values of the graph are freed
 when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to 
backward through the graph a second time or if you need to access saved tensors after 
calling backward.

Here is my code for training:

def noise(batch_size):
    """
    Generates a 1-d vector of gaussian sampled random values
    """
    return torch.randn(batch_size, 100, requires_grad=True)

def noise_loss(self, noise_data):
    n0 = noise_data[0].reshape((1, -1))
    n1 = noise_data[1].reshape((1, -1))
    n_loss0 = torch.linalg.norm(n0)
    n_loss1 = torch.linalg.norm(n1)
    return n_loss0 + n_loss1

def verification_loss(self, fake_pred):
    # target_feature is computed by ArcFace with image input and saved.
    v_loss0 = F.cosine_similarity(fake_pred[0], self.target_feature[0], dim=0)
    v_loss1 = F.cosine_similarity(fake_pred[1], self.target_feature[1], dim=0)
    return v_loss0 + v_loss1

def optimize(self, num_epochs):
    # freeze ArcFace
    self.predictor.eval()

    batch_size = 2
    # start to train the generator
    for epoch in range(num_epochs):
        # generate batch of noises
        noise_data = self.generator(noise(batch_size).to(self.device))
        fake_data = noise_data * self.imposter_mask + self.imposter_image

        # get feature from ArcFace
        fake_pred = self.predictor(fake_data)

        # noise loss
        n_loss = self.noise_loss(noise_data)
        # verification loss
        v_loss = self.verification_loss(fake_pred)

        g_error = n_loss + v_loss

        # Update the generator
        self.g_optimizer.zero_grad()
        g_error.backward()
        self.g_optimizer.step()

imxtx · May 9, 2022, 1:28pm

I figured out. I didn’t detach the target_feature from the computation graph. But I still don’t know why do I need to detach it.

# load data
self.target_image = torch.tensor(load_image("data/train/target.jpg")).to(self.device)
# target feature, why need to detach?
self.target_feature = self.predictor(self.target_image).detach()