Currently only conv+bn folding is happening in graph optimizer. Why not fold convtranspose(deconv) as well as its implementation is essentially convolution, is there some other considerations not to do this? I am suspecting the conv transpose is not implemented the same in different backends…
You chance to be a hero.
More seriously, the most common thing people want to run on this is image detection using conv, transposed convolution is relatively rare compared to that, so it didn’t get the same attention so far. …until you came along!
As you mentioned, it depends entirely on the backend whether it has a fused implementation for the op. So you’d need to add support for it in your backend if you see it has a chance to increase performance. One way to do this is to add your own backend-specific ConvolutionFusedTranspose Node and then look for Convolutions followed by Transposes and replace them in the graph with your own Node, and then map down that Node to your specialized kernel.