I think you are looking for torch.nn.utils.rnn.pad_sequence
.
If you want to do this manually:
- One greatly underappreciated (to my mind) feature of PyTorch is that you can allocate a tensor of zeros (of the right type) and then copy to slices without breaking the autograd link. This is what
pad_sequence
does (the source code is linked from the “headline” in the docs). The crucial bit is:
out_tensor = sequences[0].data.new(*out_dims).fill_(padding_value)
for i, tensor in enumerate(sequences):
length = tensor.size(0)
# use index notation to prevent duplicate references to the tensor
if batch_first:
out_tensor[i, :length, ...] = tensor
else:
out_tensor[:length, i, ...] = tensor
If the tensors require grad, so will out_tensor and the gradients will flow back to the tensors in the list.
- Another way to do this, that seems closer to your description, is to use a
cat
(or pad) in a list comprehension and push that to anothercat
.
# setup
import torch
l = [torch.tensor([1,2,3]), torch.tensor([4,5]),torch.tensor([6,7,8,9])]
emb_len=4
# this is what you want:
lp = torch.stack([torch.cat([i, i.new_zeros(emb_len - i.size(0))], 0) for i in l],1)
Best regards
Thomas