I know it isn’t a great topic, but just want to bring this up as it did surprise me.
Could you describe what kind of performance regression you were seeing?
Runtime of an inference run got like 3x/4x slower.
Your information is unfortunately too vague and we didn’t observe any regression in our nightly testing.
I understand it’s very vague … I will try to narrow down the problem, at least in term of builds ; we were using the 1.17 cuda build until now (through the docker package), I will try to switch to the 1.18 one when I get a chance, or back to the first cuda 1.12.1 available to see if it has anything to do with this.