What does torch.backends.cudnn.benchmark do?

Thanks for your reply. I tried setting the env variable CUDNN_CONV_WSCAP_DBG=38000 (My cudnn version is 8.2.1.32). I am testing on A100-40GB GPU and the documentation says the value should be set in MiB. However, I still get OOM. Did you anticipate that limiting the cudnn workspace would help?

I figured that doing the benchmarking on smaller input to figure out the best algorithm, and then just directly setting that algorithm without further benchmarking could help avoiding the VRAM usage peak. But I did not find out how to do this even after digging further into this question and this git issue.