Current, I can pass a list of Tensors, all of which vary in size, to a Module and run forward on them. However, because they’re variable in size, I cannot use torch.bmm()
or similar to quickly multiply each Tensor by another Tensor. Instead, I’m using a for-loop, finding the right weights to use based on the shape of the Tensor, and multiplying in a sequential fashion. Since these matrices are small, I’m not effectively using my GPU. Is there any way to optimize this operation?
Edit: padding is not an option for me.