I have been struggling with an extra and confusing time problem which I encountered for twice:
I have two model running simultaneously with a joint loss but it is quite slow(up to 0.4s~0.5s+) for each step. So I was trying to improve my efficiency through checking some time spot using
time.time(). And I found out The first small model is just fine(like 0.002s). However when I was checking the second model. At the beginning of my second model’s forwarding, Something weird just happen:
def forward(feature, coords, permutation): """ feature : [B, d, N, 1] coords: [B, N ,3] """ t1 = time.time() coords: [:, permutation] t2 = time.time() feature = feature[:, :, permutation] t3 = time.time() print(t2 - t1) # running for 0.24s print(t3 - t2) # 0.00... very short anyway
def forward(feature, coords, permutation): """ feature : [B, d, N, 1] coords: [B, N ,3] """ t1 = time.time() feature = feature[:, :, permutation] t2 = time.time() coords: [:, permutation] t3 = time.time() print(t2 - t1) # running for 0.24s print(t3 - t2) # 0.00... very small anyway
It seems that the what really counts is its sequence instead of what the code truly does.
I was debugging my code using
time.time() like before, and the time consumption was becoming even weirder. In my second model, I found a very time consuming line :
a[b] , especially when there is an action that put b.to(device), it is just a simple slice operation but it took 0.4s+, 95%+ of the whole model. So I change a way to get the slice, removing the
.to(device) part it decreased to several ms like it is solved. However, unfortunately, I found the total time still remain unchanged , what is more, I found another line of code became suddenly very time consuming with no reason just like a ghost with absolutely no changes done to it. So I did rewrite the sentence again. But it is vain, because the “time consuming” part just moved to next part like a virus.
However I found a common feature between them. Every “ghost time consuming” part will be related to (to(device) / device / … etc.) , and when you fix one of them, another one will just become the “ghost time consuming” part again.
Is there a preparation stage of GPU like it can’t be removed ? If yes, why dose every step needs this ? Literally every step in my iteration has a “ghost” time spot.