I’ve created a Conditional DCGAN to work on the MNIST data. It involves applying some transformations and reversing them before feeding the data to the Conditional DGAN for training. Therefore I’ve created a custom dataset from custom transformations on the MNIST.
However, the training time is too long. One epoch took about 3 hours. I’ve modified the Data loader to use num_workers=6 and pin_memory=True, and still see the training is painstakingly slow. How can I speed this up?
Using num_workers=16 throws error:
assert self._parent_pid == os.getpid(), 'can only join a child process'
AssertionError: can only join a child process
Is there any other way to increase training speed?
Full code can be found at: https://colab.research.google.com/drive/1gUr54oAwONwqCRWcWc2lMKSQ8ZOJxw0N
Could you try to time your data loading using
data_time from the ImageNet example?
Alternatively, could you just time the transformations?
Since they seem to perform some OpenCV workload it might be the bottleneck.
Is this meant to influence training speed?
If your data loading is the bottleneck, your actual training will have to wait until the next batch is provided, so yes it’ll influence the training time.
Would it help to apply transformations on MNIST once and save it for use as and when needed? this would speed up fetching data I thought. Please let me know your opinion.
If the transformations are static/deterministic, i.e. yielding the same result for each call, then you could store the transformed tensors and just load them (lazily) for your training.
This should speed up the training, if the transformations are really the bottleneck, so you should check that first before optimizing unnecessary code parts.