The second model on the same GPU slows down inference. Why?

I have two models. One model generates spectrograms and the other uses the spectrogram to generate audio. Now, the first run is fast for both models, but in the second run, the first model slows down significantly. Why? My loop looks as follows

t = time.time()
with torch.no_grad():
    spectrograms = model1(phonemes)
print(time.time() - t)  # 35 ms

t = time.time()
with torch.no_grad():
    audio = model2(spectrograms)
print(time.time() - t) # 7 ms

t = time.time()
with torch.no_grad():
    spectrograms = model1(phonemes)
print(time.time() - t)  # 2640 ms

Why should be the second call to model1 so slow? The slowness persists for consecutive runs.
It does not make sense to me, that the first run is so fast and the consecutive runs so slow.
If i drop the second model and only call the first model repeatedly, th eproblem persists, but the duration of the consecutive calls drops to around 133 ms.

Any ideas would be appreciated!

Alright, it was caused by not using torch.cuda.synchronize().
I had to call this command before calling the timer and before calculating the final duration. Only then I started getting consistent timing.

1 Like