My training set has many pictures, with 450,000 negative samples and 50,000 positive samples. There are some GRU layers in my model. My computer config is CPU I5-8400+GPU 980TI. It takes 6 hours to run an epoch. I am modifying the optimization model again. This training process is too painful. Is there any good way to speed up the training?
You could stratify your sampling to use - say - only 250k negative samples per epoch. I’m not sure there is much support for it, though. I would probably subclass the dataset, set the length to 250k and draw a random permutation on each epoch from which the first 250k indices are the dataset contents.
I’m not certain how well that works with multiple dataloading processes though.
Three methods have been used:
The first is that all data sets are used for training, the training time is too long, and a large amount of negative data is likely to cause over-fitting.
The second is to add num_workers to the dataload. This multi-process approach will reduce the training time of a single epoch, but the model predicts performance is degradation.
The third is upsampled positive samples. The positive sample is expanded to as much as the negative sample, and the total sample is random.shuffle and divided into ten groups:
The model is trained separately for these ten group samples, and then the model weights are averaged. The prediction performance of this new model is lower and there is almost no prediction effect.
In addition, I also use the migration learning method. After the first data set is trained, then the first model is used to migrate to train the second data set…, so that the predicted performance of the final training model is lower than the first method.