In my forward function, I want to implement something like this
def forward(self, x):
for i in range(...):
do something with x.narrow(0, l(i), r(i) - l(i)) # I could ensure that l(i) <= r(i+1)
Each chunk of x could be used parallel, but if I implement it with for-loop in forward function, it could only be processed in a pipeline. So what’s the best practice?