Torch.tensor(): CUDA-capable device(s) is/are busy or unavailable

Our torch.tensor() calls are failing with the below error:
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Our driver setup / printout is the below:
NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2
__Python VERSION: 3.10.11 (main, Nov 30 2023, 18:20:49) [GCC 7.5.0]
__pyTorch VERSION: 2.1.0+cu121
__CUDA VERSION 12.1
__CUDNN VERSION: 8902
__Is CUDA available: True
__Number CUDA Devices: 1
Active CUDA Device: GPU 0
Available devices 1
Current cuda device 0
GPU count: 1

Trying to understand if there’s a weird version compatibility here causing this issue- in our container image we specify CUDA 12.3 & Torch 2.2.1, so it looks like there’s an override happening somewhere

The PyTorch binaries ship with their own CUDA runtime dependencies and your locally installed CUDA toolkit will be used of you build PyTorch from source or a custom CUDA extension.
Make sure your container can run any CUDA application, as the error might point to your setup.