Num_workers and cuda speed confusion

Hi, I’m trying to get a feel for pytorch as a relatively new programmer using the iris dataset but I have an issue. I was running a few tests on training speed regarding CPU/GPU and num_workers and I get some interesting results.

CUDA, num_workers=0
2.430s
CUDA, num_workers=1
18.741s
CPU, num_workers=0
1.619s
CPU, num_workers=1
11.038s

As you can see it seems faster to run on the CPU with no subprocesses which shouldn’t be the case. Can someone explain why or if my implementation is wrong?

Here’s my code/model:

1 Like