Hi,
As I linked in another ticket, I found that this implementation is lack of vectorisation. When one retrieves data in loader
, MyDataset.__getitem__
will be called millions of times. This becomes a bottleneck of my training on GPU. In Keras, we know that larger batch_size will reduce the training time; however here, batch_size will have small effect on the training time due to the loop over the training points. Is there any suggestion to avoid this?