How can I use the function at::cuda::blas::gemm<float>()?

Hello, for my use case it is absolutely nessecary to use this “low level” function at::cuda::blas::gemm<scalar_t> and I am getting the same linker error like OP. Otherwise I suppose I could also directly link against cuBLAS and take the original function from there, but it would be nicer to have access to this wrapper since it elegantly handles different floating types.

I have made sure that PyTorch and the extensions are built with the same compiler and ABI flag (which seems to be another source for this linker error)