Time increases per iteration of einsum

Duplicate of Sudden decrease in performance after N numbers of consecutive calls to forward (on GPU) · Issue #15793 · pytorch/pytorch · GitHub.

You are not measuring timing accurately. CUDA launches are asynchronous…

torch.cuda.synchronize()

1 Like