How to make `num_workers > 0` work on Windows10?

My PyTorch version is 1.8.1.
I have a piece of code below,

import as data
import numpy as np
from resnet import ResNet

class CLS:
    def __init__(self):
        self.model_k = 'res50'
        self.model = ResNet(depth=50, num_classes=7)

class CLS_dataset(data.Dataset):
    def __init__(self, cfg):
        self.cfg = cfg

    def __getitem__(self, index):
        img = np.random.randn(3, 512, 512).astype('float32')
        label = 0
        return img, label

    def __len__(self):
        return 100

if __name__ == '__main__':
    cfg = CLS()
    net = cfg.model.cuda().train()
    dataset = CLS_dataset(cfg)

    data_loader = data.DataLoader(dataset, batch_size=4, num_workers=4, shuffle=True, pin_memory=True)

    for images, targets in data_loader:
        images = images.cuda().detach()
        targets = targets.cuda().detach()

        loss = net(images, targets)

Directly run the code in Anaconda Prompt just get the following error:

RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:253

(installer) E:\classification_issue>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\envs\installer\lib\multiprocessing\", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\ProgramData\Anaconda3\envs\installer\lib\multiprocessing\", line 126, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

I also try tried make the if __name__=='__main__' just above the class CLS_dataset(data.Dataset) or just above the class CLS. None of them worked.

But if I do the following comment, the code can run successfully.
Or I just create the net with Resnet() directly, the code can also run successfully.

What’s the reason behind it?