INTERNAL ASSERT FAILED at "../c10/cuda/CUDACachingAllocator.cpp":829 while calculating dot product

attn = (q @ k.transpose(-2, -1)) * self.scale
^~~~~~~~~~~~~~~~~~~
RuntimeError: NVML_SUCCESS == DriverAPI::get()->nvmlDeviceGetHandleByPciBusId_v2_( pci_id, &nvml_device) INTERNAL ASSERT FAILED at “…/c10/cuda/CUDACachingAllocator.cpp”:829, please report a bug to PyTorch.

can anyone please help me to solve the above bug?