Hello,
i implemented a transformer-encoder which takes some cp_trajectories and has to then create a fitting log mel spectrogram for those. Because input as well as labels are variable in length i use a custom_collate_fn to pad them like this
import torch
from torch.nn.utils.rnn import pad_sequence
def pad_and_mask(batch):
# Assuming each element in 'batch' is a tuple (sequence, label)
sequences = [torch.tensor(item[0]) for item in batch]
labels = [torch.tensor(item[1]) for item in batch]
# Pad the sequences to have the same length
sequences_padded = pad_sequence(sequences, batch_first=True, padding_value=0)
labels_padded = pad_sequence(labels, batch_first=True, padding_value=0)
# Create attention masks for sequences
attention_masks = torch.zeros((sequences_padded.size(1), len(batch)), dtype=torch.float32)
for i, seq in enumerate(sequences):
attention_masks[i, :len(seq)] = 1
# Create label masks for labels
label_masks = torch.zeros_like(labels_padded, dtype=torch.float32)
for i, label in enumerate(labels):
label_masks[i, :len(label)] = 1
return sequences_padded, attention_masks, labels_padded, label_masks```
i use the attention_masks in the transformer and i believe they work fine if i input them into key_padding_mask. Now i want to caltulate the loss as a MSELoss but i dont want the loss to be skewed because i dont mask the output of my transformer. How would i implement that? Or is that simply a completely wrong approach.
Thanks for the help :D