Creating a tensor on the GPU freezes the process

I am running this repo on my system ( MIG device , 80GB of memory) to train a classifier. I want to pass a list of weights to the loss function in as below:

if __name__ == '__main__':

    args = parse_args()
    os.makedirs(args.model_dir, exist_ok=True)
    os.makedirs(args.log_dir, exist_ok=True)

    if args.enet_type == 'resnest101':
        ModelClass = Resnest_Melanoma
    elif args.enet_type == 'seresnext101':
        ModelClass = Seresnext_Melanoma
    elif 'efficientnet' in args.enet_type:
        ModelClass = Effnet_Melanoma
        raise NotImplementedError()

    DP = len(os.environ['CUDA_VISIBLE_DEVICES']) > 1

    device = torch.device('cuda')

    c_weights = torch.tensor([0.954, 0.8274, 0.852, 0.987, 0.967, 0.986, 0.735, 0.687]).float().to(device)
    #loss function
    criterion = nn.CrossEntropyLoss()

When I create the tensor on CPU it works just fine (I need to use a GPU because of my image). But when I send the tensor c_weights to the GPU the train process gets stuck without returning any errors. I tried different methods for creating the tensor and different dtypes it did not work. This is what it shows in the output (the process gets stuck on 2 and does not progress):

I tried smaller batch sizes as well. No improvement.
Here is my GPU information:

GPU Driver Version: 510.85.02
CUDA Version: 11.6
memory: 81069MiB

How can I solve this issue?

My args are:

--kernel-type = 8c_b3_768_512_18ep
--data-folder = 512
--image-size = 512
--enet-type = efficientnet_b3
--batch-size = 32
--num-workers = 32
--out-dim = 8
--CUDA_VISIBLE_DEVICES = MIG-cdc1351f-1b7a-554c-a273-f7643f99523f
--fold = 0

I was able to get the device and c_weights lines to run as is. I would first verify args.CUDA_VISIBLE_DEVICES is correct then try adding a couple of print lines for device and visible devices, just before c_weights, to see if that might be the issue.

I edited my post and added the args I’m using. I printed these info:

cuda visible device : MIG-cdc1351f-1b7a-554c-a273-f7643f99523f
device :  cuda
c_weights: tensor([0.9540, 0.8274, 0.8520, 0.9870, 0.9670, 0.9860, 0.7350, 0.6870],