Copying model output to a torch.Tensor where requires_grad is True

I have a model that outputs a sequence of vectors for each element in the batch, e.g., [Batch size, Sequence Length, Hidden size]. Then, I want to select a variable number of vectors for each element in the batch, and copy these vectors to a tensor where requires_grad = True. A sample code is bellow:


from torch import nn
from typing import List

class MyModel(nn.Module):
    
    def __init__(self):
        super(MyModel, self).__init__()
        self.fc = nn.Linear(8,8)
    
    def forward(self, x: torch.Tensor, indices: List[torch.Tensor]):
        # Example indices: [torch.tensor([0,1]), torch.tensor([2,3,4])]
        out = self.fc(x)
        batch_size, _, hidden_size = out.size()
        max_num_hidden_states = max([ind.size(0) for ind in indices])
        selected_hidden_states = torch.zeros(batch_size, max_num_hidden_states, hidden_size, requires_grad=True)
        for i in range(batch_size):
            selected_hidden_states.data[i, :indices[i].size(0)] = out[i, indices[i]]
        return selected_hidden_states
    
model = MyModel()
with torch.no_grad():
    output = model(torch.rand(2, 5, 8), [torch.tensor([0,1]), torch.tensor([2,3,4])])
     

The questions I have w.r.t. this are:

  1. If I train such model, would the gradients be backpropagated in the rest of the model parameters?
  2. Why does output.requires_grad = True, when I explicitly state torch.no_grad()?
  3. The way I’m doing this (which it doesn’t seem to work as expected as of now) seems too hacky and wrong. What is the proper way to achieve what I want?

I’m aware of this answer, which approves my way doing it (at least it seems like it), but still it looks hacky to me.

As this is my first question in the forum, please let me know if I should provide further details :slight_smile:

Cheers!

That is from a time long gone and anwering a different question.

  1. No. Between creating a new tensor requiring grad and using .data, which you never should these days, you created a new leaf which will accumulate .grad.
  2. Because you requested it. no_grad signals that you do not need the grad, it does not include guarantees about the requires_grad of the result.
  3. If the utility function does not work for you, dropping the requires_grad and the .data should do the trick.

Best regards

Thomas

1 Like

Hi Tom! Thanks, I actually made things more complicated then they are :slight_smile: