You are using a PyTorch binary with CUDA <= 12.6 while PyTorch 2.7.0+ with CUDA 12.8+ is needed. Select it from the install matrix as explained in the post above and it will work.
As a quick smoke test you could run this directly after installing the latest stable or nightly binary:
import torch
print(torch.__version__)
print(torch.cuda.get_arch_list())
print(torch.randn(1).cuda())