An efficient implementation of the following using tensor

I am looking for an efficient implementation which can remove the for loop in the following code. From my previous posts, I understand this can be done by torch.compile. But I feel for this specific case, we don’t need torch.compile and built-in pytorch functionalities should be sufficient. Any solution?

n,c,w = 4,5,7
d0, d1 = 10, 3 # in real case, there might be more dimensions, d0,d1,...,dD-1
transfer = torch.rand(n,c,w)
signal = torch.rand(d0,d1,n,c,w)
results = torch.zeros((*signal.shape[:-2], w) # shape (d0,d1,n,w)

# Efficient implementation to remove the for loop?
for i in range(n):
    results[...,i,:] = torch.sum(signal[...,i,:,:] * transfer[i], dim = -2) # (d0,d1,w)