Hi all –
I’m brand new to PyTorch, so please forgive my naivete. I’m trying to build my own version of the SnapChat gender-swapping filter. The approach I’ve settled on, in pseudo-code, is as follows:
#I already have two working models ready to be loaded as follows, both set to .eval() after loading.
gender_classifier = load(pretrained_gender_classifier_model)
face_embedder = load(pretrained_VGGFace2_model)
#The user submits a photo to be processed, and the "ground truth" of that photo is determined.
ground_truth_gender = gender_classifier(user_submitted_image)
ground_truth_embedding = face_embedder(user_submitted_image)
#Initialize a random image to refine into the final product
candidate = tensor(white_noise_image, requires_grad = True)
optimizer = torch.optim.Adam([candidate], lr = lr)
#Training Loop
for i in num_iterations:
optimizer.zero_grad()
#Determine the embedding and gender of the candidate
candidate_gender = gender_classifier(candidate)
candidate_embedding = face_embedder(candidate)
#Determine how bad these values are compared to what I *want*, which is
#a similar embedding, but opposite gender.
gender_loss = binary_cross_entropy(candidate_gender, !ground_truth_gender)
embedding_loss = mse_loss(candidate_embedding, ground_truth_embedding)
loss = gender_loss * hyperparameter_weight + embedding_loss
optimizer.step()
I realize this is not a great implementation of what is effectively a form of neural style transfer, and I’ve tried reading through the official PyTorch tutorial on NST, but found that the approach used there is vastly different from what I’m attempting here.
My implementation throws no errors, but simply outputs the original white-noise candidate image unaltered, no matter how many iterations I run the optimizer. I’ve accepted that, whatever I’m doing, I must be doing it wrong, but I’m at a loss as to what I’m messing up.
If anyone could lend some insight into what makes the above schema not work, or what tenet of PyTorch architecture I’m violating, I would hugely appreciate it. I want to know what my mistake is so I can really understand why any alternate solution I find on the interwebz works better.
As it stands, I’m guessing it has something to do with not being able to use optimizers on tensors that aren’t parameters in a model? If that’s the case, should I try defining a third model that takes in the user image and returns the embedding and classification from its forward() method, with the candidate image initially defined in the init()?
I’ve spent a good number of hours on this without much progress, so thank you in advance to anyone who takes the time to respond.