I have a model that outputs a sequence of vectors for each element in the batch, e.g., [Batch size, Sequence Length, Hidden size]
. Then, I want to select a variable number of vectors for each element in the batch, and copy these vectors to a tensor where requires_grad = True
. A sample code is bellow:
from torch import nn
from typing import List
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.fc = nn.Linear(8,8)
def forward(self, x: torch.Tensor, indices: List[torch.Tensor]):
# Example indices: [torch.tensor([0,1]), torch.tensor([2,3,4])]
out = self.fc(x)
batch_size, _, hidden_size = out.size()
max_num_hidden_states = max([ind.size(0) for ind in indices])
selected_hidden_states = torch.zeros(batch_size, max_num_hidden_states, hidden_size, requires_grad=True)
for i in range(batch_size):
selected_hidden_states.data[i, :indices[i].size(0)] = out[i, indices[i]]
return selected_hidden_states
model = MyModel()
with torch.no_grad():
output = model(torch.rand(2, 5, 8), [torch.tensor([0,1]), torch.tensor([2,3,4])])
The questions I have w.r.t. this are:
- If I train such model, would the gradients be backpropagated in the rest of the model parameters?
- Why does
output.requires_grad = True
, when I explicitly statetorch.no_grad()
? - The way I’m doing this (which it doesn’t seem to work as expected as of now) seems too hacky and wrong. What is the proper way to achieve what I want?
I’m aware of this answer, which approves my way doing it (at least it seems like it), but still it looks hacky to me.
As this is my first question in the forum, please let me know if I should provide further details
Cheers!