Thanks for the update!
It’s defined by the available algorithm and depends on the memory layout, data type, memory alignment etc.
No, I don’t know how much performance will be lost, as benchmarking is not even working.
I think the right approach would be to limit the cudnn workspace size requirements and skip algos, with a workspace > threshold
requirement.
If you are using cudnn>=8.0.5, you could use this env variable as a workaround for now:
CUDNN_CONV_WSCAP_DBG=4096 python script.py args
Where the 4096
are specified in MiB (you can use a lower/higher values, if applicable).