I wrote up a simple example of matrix addition compiled as TorchScript:
@torch.jit.script
def mat_add(A: torch.Tensor, B: torch.Tensor) -> torch.Tensor:
C = torch.empty_like(A)
for i in range(len(A)):
for j in range(len(A[0])):
C[i,j] = A[i,j] + B[i,j]
return C
print(mat_add.graph)
I thought that torch.jit.script could vectorize loops when appropriate, but the compiled graph still shows element-wise operations and thus no speedup. Is there a way to recognize and vectorize loops like these with torch.jit or some other way?