Pytorch cuda 11.2 build from source: RuntimeError: CUDA error: no kernel image is available for execution on the device

I think they might be related.
The slow startup time using the binaries points towards the CUDA JIT, which would be used, if a compute capability is missing, while the error you are seeing in the source build is also claiming that the expected compute capability is missing for your device.
Are you using any other GPU in this system or only the Turing one?