CuDNN error for loss.backwards

    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED

training code:

device = torch.device("cuda:2" if torch.cuda.is_available() else "cpu")
dataset = LoveDA(root="loveda", split="train", scene=['urban'], download=True, transforms=None) # Download the dataset
# sampler = RandomGeoSampler(dataset, size=512, length=10000)
dataloader = DataLoader(dataset, batch_size=1, shuffle=True, num_workers=4) # Create a dataloader


# Using segmentation models, train a Unet model with an efficientnet backbone. Iterate over each batch in the dataloader and train the model.

model = smp.Unet('tu-efficientnetv2_rw_s', encoder_weights=None, classes=1)
# criterion = nn.BCELoss()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
sched = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=5, verbose=True)
model = model.to(device)

for epoch in range(NUM_EPOCHS):
    model.train()
    with tqdm(total=len(dataloader)) as pbar:
        for batch_idx, batch in enumerate(dataloader):
            data = batch['image'].to(device).type(torch.float32)
            target = batch['mask'].to(device).type(torch.float32)
            output = model(data)
            output=torch.squeeze(output, dim=1)
            print(output.shape)
            print(target.shape)
            loss = criterion(output, target)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            pbar.update(1)```

Could you post the output of python -m torch.utils.collect_env here, please?
CUDNN_STATUS_NOT_INITIALIZED is often raised due to a setup issue, which shouldn’t happen if you are using the binaries.