8.7 GB CUDA block allocated and then freed by Conv2d forward

ptrblck · December 8, 2020, 10:02pm

Thanks for the update!

It’s defined by the available algorithm and depends on the memory layout, data type, memory alignment etc.

No, I don’t know how much performance will be lost, as benchmarking is not even working.

I think the right approach would be to limit the cudnn workspace size requirements and skip algos, with a workspace > threshold requirement.
If you are using cudnn>=8.0.5, you could use this env variable as a workaround for now:

CUDNN_CONV_WSCAP_DBG=4096 python script.py args

Where the 4096 are specified in MiB (you can use a lower/higher values, if applicable).