Help Training ImageNet from Scratch

Hi, I’m looking to get some advice for someone who has successfully trained different architectures from scratch on ImageNet to help me out with a few issues. Sorry for the long post, any help is greatly appreciated.

I used the ImageNet example code as my baseline and adapted it, and fine-tuning works very well for me when I already have the pre-trained weights, but things aren’t going very well when I try and train a new architecture from scratch on the dataset.

Firstly, data transferring to the GPU is causing some bottlenecking in training time. Currently, I have the dataset in the folder format, and I have the number of workers set to my number of CPU cores (16). Every epoch, the data loading starts quickly at the start but seems to gradually slow down throughout the epoch. It tends to have 10-15 batches with 0.00s data loading time, followed by a batch or couple of batches with a long load time (2+seconds), and follows this pattern throughout training. I’m not an expert in multiprocessing, but are there any settings I can tweak here to fix this issue? I’ve trained on another large image dataset (Places365) and didn’t have the same data transferring issues.

Would converting ImageNet to LMDB and adapting the LMDB loader fix this issue for me?

Secondly, the accuracy of training that I’m getting is not reflecting what I see from training plots from the same architecture trained in other frameworks or in papers. With the same model, I see results that have the cross-entropy loss decrease from 6->2, and the top-5 validation accuracy reach 70% within the first 10 epochs, whereas in my experience, I’m only at a CE loss of 4 after 20 epochs. I’ve tried both He and Glorot initializations, both polynomial learning rate decay and step decay every N epochs, but can’t seem to even come close to replicating the convergence I’m seeing in training curves.

The main learning rate schedules I’m trying are starting at either 0.1 or 0.01, and either dropping by 10 every 30 epochs, or using polynomial rate decay to the power of 4.

Additionally, my training/validation accuracy gap seems to be a lot higher than what I see in other training curves on ImageNet.

If anyone has experienced any of these issues and has any advice, your help would be greatly appreciated. I’m currently trying to train MobileNet but I seem to have these training issues on any architecture I try.


hi, I had not used pytorch for CNN training. I only employed Caffe.

LMDB did have some help, refer Jia Qingyang’s tip:

In my training from stratch of VGG16 on imagenet, the top1 is only 52%, I’m still struggling on it.

As the training/validation accuracy gap, would you share the log file or the training loss - iteration plot so that we could have a sense of your problem?