Is there an alternative faster method to deal with for loops

Given below is a piece of code that i want to optimize to be faster with backward pass.

root_transforms[..., :3, 3] = torch.cat((op[..., 0, :], ot[..., 0, :]), dim=0)
integrated_root_trans = root_transforms.clone()

for i in range(1, nframes):
    integrated_root_trans[...,i, :, :] = integrated_root_trans[...,i-1, :, :].clone() @ root_transforms[...,i, :, :]

Here the shape of root_transforms is (batch_size, nframes, 4, 4) and basically the loop is multiplying the ith frames root_transforms with the i-1th frames root transfoms and the result is the integrated root transform for the ith frame. As it can be observed that the integrated root transform depends on the previous frames integrated root transform. Hence I have to use a for loop. Using this for loop significantly slows down the backward pass. Is there a way I can make this faster??