Optimization of input tensors

Amiro · February 25, 2020, 3:24am

Hi,

I wanted to add 2 data tensors to the optimizer (data_words & data_phrases) but only (data_phrases) is optimized and not (data_words). Is there any mistake in the way I added them to the optimizer as in the sample code below (from inside the training loop of a softmax model)?

self.optimizer.zero_grad()
data_phrases = Variable(sample[0][0], requires_grad=True)
data_words = Variable(sample[1][0], requires_grad=True)
self.data_phrases = nn.Parameter(data_phrases)
self.data_words = nn.Parameter(data_words)
self.optimizer.add_param_group({“params”: data_phrases})
self.optimizer.add_param_group({“params”: data_words})
labels_phrases = sample[0][1].type(torch.LongTensor)-1
logits, probas = self.model(data_phrases .float())
loss = self.criterion(logits, labels_phrases) + self.regularizer
loss.backward(retain_graph=True)
self.optimizer.step()

Best

albanD · February 25, 2020, 3:37pm

Hi,

data_words is not used to compute the loss in your example. So that would explain why it does not get any gradients and is not updated.

Amiro · February 25, 2020, 3:49pm

Hi Alban,

thanks for your email,

I see your point but I guess we can simply calculate the gradient of any tensor right? does not need to be involved directly in calculating the loss right?

Trying to check this point further, I tried to see the (.grad) of the tensors (data_phrases) and (data_words) and found that the gradients of the tensor (data_phrases) are updated unlike those of the tensor (data_words) it is always = None.

But coming to these 2 lines:

self.optimizer.add_param_group({“params”: data_phrases})
self.optimizer.add_param_group({“params”: data_words})

I guess the matter is related to a problem in passing the tensor (data_words) to the optimizer. For some reason only the (tensor_phrases) is passed correctly to the optimizer, but not the tensor (data_words). to check further, instead of trying to pass the tensors (data_phrases & data_words) one after one, I only passed one tensor (data_words) alone, and in this case it worked well and I saw gradients were calculated.

So my question turns to be: How can I pass multiple tensors together to the optimizer? I tried many times but it does not work.

Any advise?

Thanks

A

albanD · February 25, 2020, 3:55pm

I see your point but I guess we can simply calculate the gradient of any tensor right? does not need to be involved directly in calculating the loss right?

Well if the Tensor is not involved in the loss computation, it’s gradient will be 0s everywhere (or None, it is the same thing).
So to have meaningful gradients, you need to use the Tensor, otherwise, they will always be 0 and you won’t be able to learn anything.

Amiro · February 25, 2020, 4:07pm

Hi Alban

Actually I have calculated the (data_phrases) from (data_words) but this was outside the training loop. So what I was intending to do is to pass to the optimizer the same batch of (data_words) that produced (data_phrases). So it is indirect relationship. so that the optimizer updates the “data_phrases” (directly used to calculate the loss) and “data_words” (indirectly related).

The point I passed the tensor (data_words) to the training loop while not being concerned with the softmax model is that I needed to pass it to the same optimiser, therefore I made it this way.

still no sense?

I see your point, but then why if i only pass to the optimizer the tensor (data_words) self.optimizer.add_param_group({“params”: data_words}), I find that the gradients are calculated successfully (and not None)?

albanD · February 25, 2020, 4:11pm

Most likely because you also remove other code like the wrapping in the nn.Parameter() for data_phrases that disconnects data_phrases from data_words.

Actually I have calculated the (data_phrases) from (data_words) but this was outside the training loop.

If data_phrases is computed from data_words, why should it be in the optimizer? Shouldn’t you just update data_words and recompute data_phrases every time data_words is updated?

Amiro · February 25, 2020, 4:32pm

I could not understand this point (Most likely because you also remove other code like the wrapping in the nn.Parameter() for data_phrases that disconnects data_phrases from data_words.) I dont remove any thing indeed, exactly same code but instead of passing:

self.optimizer.add_param_group({“params”: data_phrases})
self.optimizer.add_param_group({“params”: data_words})

I pass

self.optimizer.add_param_group({“params”: data_words})

What is remarkable is that If i change the order with respect to the first case:

self.optimizer.add_param_group({“params”: data_words})
self.optimizer.add_param_group({“params”: data_phrases})

only (data_words) will be updated and (data_phrases) will give None, unlike the first case, which made me doubt about these 2 lines and how they are passed to the optimizer. I feel confused!

actually it is possible to ignore (data_phrases) and not pass it to the optimiser as you mentioned. but the point is that it is a softmax process for phrases (not for words), i.e., I can try to update only the tensor (data_words) and use the updated words to produce phrases. BUT, the loss function in this case will have as input the tensor (data_phrases) and not (data_words), which takes us back to your argument about optimizing a tensor not involved in the calculation of loss.

What do you think?

albanD · February 25, 2020, 6:53pm

Ok,
This is weird.
Could you cleanup your code to do the following:

remove Variable as it is not needed anymore.
set requires_grad=True only for Tensor for which your want the .grad field to be populated.
Do not wrap things in nn.Parameter. This is onlyu needed if you are in an nn.Module and plan on calling mod.parameters() to get all the parameters.

ptrblck · February 26, 2020, 12:39am

For completeness: also partially answered here.

Amiro · February 28, 2020, 6:28am

Hi,

I have done all this but still not able to resolve the point! the (.grad) of words are still not calculated. So I need to pass the tensors of phrases and words together to the optimizer.

Any help?

Regards

A