Ignore padding while batch training AutoEncoder with LSTM Encoder and Decoder

def denoise_train(x: DataLoader):
    loss = 0
    x_padded = list(map(lambda s: pad_string(s), x))
    x_idx_tensor = strings_to_index_tensor(x_padded)
    noisy_x = list(map(lambda s: noise_name(s), x))
    noisy_x_padded = list(map(lambda s: pad_string(s), noisy_x))
    noisy_x_idx_tensor = strings_to_index_tensor(noisy_x_padded)
    noisy_x_rnn_tensor = to_rnn_tensor(noisy_x_idx_tensor)
    batch_sz = len(x)
    encoder_hidden = encoder.init_hidden(batch_size=batch_sz)

    for i in range(noisy_x_rnn_tensor.shape[0]):
        # LSTM requires 3 dimensional inputs
        _, encoder_hidden = encoder(noisy_x_rnn_tensor[i].unsqueeze(0), encoder_hidden)

    decoder_input = strings_to_tensor([SOS] * batch_sz)
    decoder_hidden = encoder_hidden

    for i in range(x_idx_tensor.shape[0]):
        decoder_probs, decoder_hidden = decoder(decoder_input, decoder_hidden)
        nonzero_indexes = x_idx_tensor[i]
        best_indexes = torch.squeeze(torch.argmax(decoder_probs, dim=2), dim=0)
        decoder_probs = torch.squeeze(decoder_probs, dim=0)
        best_chars = list(map(lambda idx: index_to_char(int(idx)), best_indexes))
        loss += criterion(decoder_probs, nonzero_indexes.type(torch.LongTensor))
        decoder_input = strings_to_tensor(best_chars)

    loss.backward()

    return names, noisy_x, loss.item()

The code I have above is for a denoising autoencoder for people names. The first few lines just noise the names then pad them so they’re all equal length for batch training. Then strings of names are then converted to index tensors where the index is the the number mapped for the character, then that gets converted into a one-hot encoded tensor that’s what to_rnn_tensor is for. encoder and decoder are both LSTMs, which is why the one-hot tensors are inputted sequentially. The hidden state is then put into the decoder which is also an LSTM.

My main problem is I’m trying to batch train this and not backprop on the padded characters. I’m having trouble doing this, because I can’t follow the standard packed_padded_sequences stuff cause this is an autoencoder setup. I’d like to batch train so I need to pad it, but I want pad characters to be ignored by the LSTM encoder and decoder.