What's the best practice for using several chunks of a tensor for forwarding?

In my forward function, I want to implement something like this

def forward(self, x):
    for i in range(...):
        do something with x.narrow(0, l(i), r(i) - l(i)) # I could ensure that l(i) <= r(i+1)

Each chunk of x could be used parallel, but if I implement it with for-loop in forward function, it could only be processed in a pipeline. So what’s the best practice?