Trying to get a better grasp of the fundamentals -- can an optimizer be called on a tensor that's not part of an nn.Module?

Hi all –

I’m brand new to PyTorch, so please forgive my naivete. I’m trying to build my own version of the SnapChat gender-swapping filter. The approach I’ve settled on, in pseudo-code, is as follows:

#I already have two working models ready to be loaded as follows, both set to .eval() after loading.
gender_classifier = load(pretrained_gender_classifier_model)
face_embedder = load(pretrained_VGGFace2_model)

#The user submits a photo to be processed, and the "ground truth" of that photo is determined.
ground_truth_gender = gender_classifier(user_submitted_image)
ground_truth_embedding = face_embedder(user_submitted_image)

#Initialize a random image to refine into the final product
candidate = tensor(white_noise_image, requires_grad = True)

optimizer = torch.optim.Adam([candidate], lr = lr)

#Training Loop
for i in num_iterations:

    #Determine the embedding and gender of the candidate
    candidate_gender = gender_classifier(candidate)
    candidate_embedding = face_embedder(candidate)

    #Determine how bad these values are compared to what I *want*, which is
    #a similar embedding, but opposite gender.
    gender_loss = binary_cross_entropy(candidate_gender, !ground_truth_gender)
    embedding_loss = mse_loss(candidate_embedding, ground_truth_embedding)
    loss = gender_loss * hyperparameter_weight + embedding_loss


I realize this is not a great implementation of what is effectively a form of neural style transfer, and I’ve tried reading through the official PyTorch tutorial on NST, but found that the approach used there is vastly different from what I’m attempting here.

My implementation throws no errors, but simply outputs the original white-noise candidate image unaltered, no matter how many iterations I run the optimizer. I’ve accepted that, whatever I’m doing, I must be doing it wrong, but I’m at a loss as to what I’m messing up.

If anyone could lend some insight into what makes the above schema not work, or what tenet of PyTorch architecture I’m violating, I would hugely appreciate it. I want to know what my mistake is so I can really understand why any alternate solution I find on the interwebz works better.

As it stands, I’m guessing it has something to do with not being able to use optimizers on tensors that aren’t parameters in a model? If that’s the case, should I try defining a third model that takes in the user image and returns the embedding and classification from its forward() method, with the candidate image initially defined in the init()?

I’ve spent a good number of hours on this without much progress, so thank you in advance to anyone who takes the time to respond.

Hi Kai!

In short, you need to call loss.backward() before calling

There is no need for a tensor to be part of a model in order to
optimize it.

You need:

  1. The tensor to have requires_grad = True, which you have.

  2. The tensor to be included as an optimizer parameter, which
    you have.

  3. To calculate the gradient of the loss function with respect to
    the tensor. This is done by the autograd system when you call
    loss.backward(), which you’re not doing.

  4. To use the gradient to perform the optimization by calling
    optimizer.step() (which you are doing, but the gradient
    isn’t there).


K. Frank

Thank you for your reply! I appreciate it. It turns out I just forgot to include loss.backward() in the above post, but it is present in my actual code (which still does not update the parameter :confused:)

While at first this is frustrating, I think it may actually be helpful in the long run, because I’m able to narrow down the true source of the bug – it’s probably somewhere in the homebrew classifier that I’m importing. I’ll have to take a more indepth look at that code and see if I can’t track down what I’m doing wrong there.

Your response clarified a lot about the overall hierarchy/structure of PyTorch for me, so again, thank you. I’m excited to keep learning and experimenting with the framework, it seems so powerful and has so many possibilities.

One additional question – if I import a model I’ve pretrained using load_state_dict, and then use that model as a part of the calculation of the cost, the parameters of the model won’t be trained, because they aren’t being passed to the optimizer, right? If I understand correctly, an optimizer can only modify the parameters directly passed to it?


I was able to track down my bug! It turns out that I tried to set the input tensor’s requires_grad = True, but managed to do it in a way that didn’t actually set requires_grad =True. Essentially, I created a tensor with requires_grad = True, then passed that through a nn.transform module, and only then assigned it my candidate variable. It looked something like this:

candidate = transform.ToPILImage()(torch.rand(C, W, H, requires_grad = True))

Which to me as a beginner seemed fine, but resulted in candidate not actually having requires_grad be True.

Just thought I’d post this in case some other poor soul falls into the same trap I did and manages to serendipitously happen across this exact forum post.