I am running into a strange issue in my code base. The code is complicated but essentially I was noticing that calling unbind(0) on tensors took longer in one commit vs another, and it was noticeably slowing down performance. The old commit has an average per-call runtime of 8.727e-5, while the new commit has an average per-call runtime of 2.36e-4.
I was playing around with unbinding and I noticed that something seems to be getting cached… subsequent calls to unbind(), even on different tensors, seems to be consistently faster than the first call, no matter which tensor is put first.
The new code could potentially be messing with whatever cache exists and making the unbind call slower, but I am uncertain how to build around this. Is it possible to preserve the cache somehow? Are there any devs who might have insight onto what’s going on here?