Speeding up tensor computation with different size

Hi,

I’m trying to use a differentiable alignment algorithm (GitHub - flatironinstitute/deepblast: Neural Networks for Protein Sequence Alignment) to align protein sequence. As the lengths of protein are different, the alignment matrix has different size. As a result, I can’t batch all of these alignment matrices and the computation is very slow. Then, I want to speed up the computation with torch.multiprocessing, but the algorithm requires autograd which is not supported by torch.multiprocessing. So is there any other method to speed up the tensor computation with different size?