Duplicate of Sudden decrease in performance after N numbers of consecutive calls to forward (on GPU) · Issue #15793 · pytorch/pytorch · GitHub.
You are not measuring timing accurately. CUDA launches are asynchronous…
torch.cuda.synchronize()