Hi all! I’ve gone through a bunch of similar posts about this topic, and while I’ve figured out the idea of needing to use padding and packing, I still haven’t been able to find how to properly pass this data into a loss function. Could anyone check if my logic of code makes sense? Thanks in advance!
- I create a padded set of data as follows:
seq_lengths = torch.LongTensor(list(map(len, observations_history)))
observations_history = pad_sequence(observations_history).to(self.device)
- Then I pass that padded data and sequence length data into the forward pass of my neural network:
# Embedding layer
x = F.relu(self.fc1(x))
# Padded LSTM layer
x = pack_padded_sequence(x, seq_lengths, enforce_sorted=False)
self.lstm1.flatten_parameters()
x, _ = self.lstm1(x)
x, x_unpacked_len = pad_packed_sequence(x)
time_dimension = torch.tensor(0).to(x.device)
last_timestep_idx = (
(seq_lengths - 1).view(-1, 1).expand(len(seq_lengths), x.size(2))
).to(x.device)
last_timestep_idx = last_timestep_idx.unsqueeze(time_dimension)
x = x.gather(time_dimension, last_timestep_idx).squeeze(time_dimension)
# Remaining layers
x = F.relu(self.fc2(x))
q_values = self.fc3(x)
return q_values
- Those
q_values
then get passed into an MSE loss function:qf1_loss = 0.5 * F.mse_loss(qf1_a_values, next_q_value)
A couple of questions here:
-
My first embedding FC layer directly uses the padded data. Is this ok? It obviously seems a bit of a waste computationally, but I don’t think FC layers can also used packed data, can they?
-
With all the padding and packing going on in the intermediate layers, is this backprop happening correctly using only the non-padded data? Or is it also passing gradients through the padded data and thus making incorrect gradient updates? What is the correct way to set this up?