8.7 GB CUDA block allocated and then freed by Conv2d forward

Thanks for the update!

It’s defined by the available algorithm and depends on the memory layout, data type, memory alignment etc.

No, I don’t know how much performance will be lost, as benchmarking is not even working.

I think the right approach would be to limit the cudnn workspace size requirements and skip algos, with a workspace > threshold requirement.
If you are using cudnn>=8.0.5, you could use this env variable as a workaround for now:

CUDNN_CONV_WSCAP_DBG=4096 python script.py args

Where the 4096 are specified in MiB (you can use a lower/higher values, if applicable).

1 Like