Well I put as example the n_workers but I refer to any parallelism that can be performed. In this case I do not use data preprocessing however computation is really heavy.
I made a checked training a convolutional variatonal autoencoder over mnist. I hoped that pytorch uses cpu parallelism at other level different than just dataloading. Is it that true? I do not now if at C level there are loop unrolls or stuff like that.
torch.get_num_threads() give me numbers greater than one.