Gradient of an image w.r.t. its classification error on a binary attribute

Federico_Romeo · April 6, 2023, 9:32am

I have a pretrained GAN generator, that takes latent vectors from either Z space, shaped [1,512], or from W space shaped [1,18,512], and outputs face images of of dimension [3,1024,1024].

I also have a classifier that gives the prediction scores for 40 binary facial attributes.

My goal is to iteratively modify an input image in the direction of editing a single facial attribute. The edit wouldn’t be directly on the image, but un the latent vector which will be the input of the GAN generator that will create the image. For example i want to obtain the same face but with “Mustache”, or “Smiling”. The way i do it right now is the following:

# Generate an image and its correspondant latent vector
attribute = "Smiling"
z = torch.randn(1,18,512, requires_grad=True).cuda()
z.retain_grad()
image, w = generator(z, input_is_style=False, return_styles=True, randomize_noise=False)
w.retain_grad()

# Get the prediction score of 'attribute'
predictions = classifier(image)                         # shape 1,80
predictions.retain_grad() 
predictions = predictions.view(-1,40,2)                 # shape 1,40,2          
predictions = torch.softmax(predictions, dim=2)         # shape 1,40,2
predictions = predictions.squeeze(0)[:,1].unsqueeze(1)  # shape 1,40
prediction = predictions[attributes.index(attribute)]   # shape 1

i = 0
lr = 1e-2
target1 = torch.tensor([1.0], requires_grad=True).cuda()

while prediction.item() < 0.8 and i<100:

    print(f"\rIter: {i} - prediction: {prediction.item():.3f}, end='')
    predictions = classifier(image)                         # shape 1,80
    predictions.retain_grad()
    predictions = predictions.view(-1,40,2)                 # shape 1,40,2          
    predictions = torch.softmax(predictions, dim=2)         # shape 1,40,2
    predictions = predictions.squeeze(0)[:,1].unsqueeze(1)  # shape 1,40
    prediction = predictions[attributes.index(attribute)]   # shape 1

    # Compute the loss of my prediction with respect to a perfect prediciton of 1
    loss = torch.nn.functional.binary_cross_entropy(prediction, target1)
    loss.backward(retain_graph=True)
    
    # Update the vector in the gradient direction
    w -= lr * w.grad

    # Generate the new image from the modified latent vector w
    image, _= generator(w, randomize_noise=False, input_is_style=True, return_styles=True)
    i += 1

tensor2im(image.squeeze(0)).resize((256,256)).show()

This code snippet works, it doesn’t give any errors, but the modifications aren’t that evident nor clear.
Do you think I’m doing something wrong gradient wise in particular? Thanks in advance.

AbdulsalamBande · April 6, 2023, 3:46pm

To improve the quality of the generated image with the desired facial attribute, consider the following adjustments:

Adjust the learning rate or use a learning rate scheduler to prevent overshooting the optimal latent vector.
Apply gradient clipping to stabilize updates and avoid large gradients.
Use regularization in the loss function to prevent the latent vector from moving too far from its original position, preserving the identity of the face.

Experiment with different values for the learning rate, regularization weight, and momentum to find the best combination for your specific problem.

Federico_Romeo · April 6, 2023, 4:44pm

Thanks for the reply.

With your suggestion the code will look like this:

# 1. Add a scheduler
optimizer = torch.optim.SGD(classifier.parameters(), lr=1e-1)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)

while [...] :
      
      prediction = ...

      # 3. How do i modify the loss with L2 reg?
      loss = torch.nn.functional.binary_cross_entropy(prediction, target1)
      loss.backward(retain_graph=True)

      lr = max(1e-3, scheduler.get_last_lr()[0])
      # 2. Clip grad value
      w -= lr * w.grad.clip(min_v,max_v)

      image = generate(w)

I do have some doubts.

Any suggestions of which kind of scheduler and on its parameters? (starting lr, step…)
Is the optimizer right? I’m not sure about the parameters() and even on the SGD.
What about the loss regularitazion? I don’t get how can i modify it since i’m doing inference, i’m not updating weights.
min_v and max_v of the clipping? -1,1?