I am training my chatbot using seq2seq. However, the dataloader’s speed of generating a batch of input is much slower than training a batch. So the speed of data loader is a bottleneck.
I want to use the built-in multiprocess in torch.utils.data.DataLoader to speed up, but I have no idea how to sort and pad samples in a batch if I use data.DataLoader. You know, to train seq2seq, a sorted and padded batch is necessary.
Anyone could give me some idea?