Ways to speed up training when models has long for loops

I hope this question make sense. Is tensorflow faster than pytorch in general when the model contains a long for loop of modules? For example, the model I have needs to call a module for ~ 20 times in a for loop at each iteration. I reimplemented in pytorch based on original author’s tensorflow code. Although I achieved similiar accuracy, the tensorflow code is about 4 or 5 times faster than pytorch code. I did some profiling and it seems this is due to the large number of calls to certain modules in each iteration. And my understanding is tensorflow created the graph before training so the for loop does not matter. Is my impression correct? Is there any ways to speed it up? If I somehow changed the for loop into a large module with deeper layers, would it be able to achieve similiar performance to the tensorflow code? Thank you!

1 Like