AUTOTUNE convolution(2x2560x16x16, 1280x2560x1x1)
How does optimizing for so many obscure shapes help someone doing 1x4x64x64 stable diffusion inferencing? The initial “1” is the batch size so might differ. The 64x64 is what is used for 512x512 images in the UNet. The VAE then takes the 64x64 to 512x512.
Wouldn’t there be a huge compile speed up in not exploring the huge shape space?