Training MobileNet on Imagenet

According to the official pytorch docs Mobilenet V3 Small should reach:

acc@1 (on ImageNet-1K) 67.668
acc@5 (on ImageNet-1K) 87.402

When I run the ImageNet Example Code however, the results are abysmal. Barely getting 10% acc@1 accuracy with default settings. I’m sure using the exact parameters/optimizers from the paper would improve things but something must be wrong that they are this bad. Has anyone done this before? I suspect I may be missing a significant pre/post processing step. It’s working as expected with resnet18 so I’m confident nothing is wrong with the data and its something about replacing the resnet18 model with the MobileNet one.

Did you follow these training steps which are linked in the docs and checked the referred GitHub issues?

1 Like

Thanks for pointing me to that! Got it up and running but it also doesn’t seem to be working. Is the “torchrun” command necessary? My understanding is that’s just for running on multiple GPUs.
I’m running:

python --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path ~/data/imagenet/ 

After 30 epochs:

Epoch: [29]  [10000/10010]  eta: 0:00:00  lr: 0.043627440786307564  img/s: 1559.5288102901363  loss: 6.9306 (6.9313)  acc1: 0.0000 (0.0972)  acc5: 0.0000 (0.5093)  time: 0.0823  data: 0.0001  max mem: 2309
Epoch: [29] Total time: 0:13:57
Test:   [  0/391]  eta: 0:15:23  loss: 6.8420 (6.8420)  acc1: 0.0000 (0.0000)  acc5: 0.0000 (0.0000)  time: 2.3624  data: 2.3319  max mem: 2309
Test:   [100/391]  eta: 0:00:30  loss: 6.8788 (6.9304)  acc1: 0.0000 (0.0000)  acc5: 0.0000 (0.7735)  time: 0.0893  data: 0.0613  max mem: 2309
Test:   [200/391]  eta: 0:00:19  loss: 6.9435 (6.9245)  acc1: 0.0000 (0.1943)  acc5: 0.0000 (0.5830)  time: 0.1009  data: 0.0727  max mem: 2309
Test:   [300/391]  eta: 0:00:08  loss: 6.9327 (6.9321)  acc1: 0.0000 (0.1298)  acc5: 0.0000 (0.5191)  time: 0.0715  data: 0.0434  max mem: 2309
Test:  Total time: 0:00:37
Test:  Acc@1 0.100 Acc@5 0.500

I’m confident my imagenet data is ok because i ran the resnet18 example on it and it seemed fine.

@ptrblck One step forward. Just the original does learn with mobilenet with default settings instead of rmsprop. Is it possible the rmsprop parameters were not added correctly?

python --model mobilenet_v3_small --data-path ~/data/imagenet/

I don’t know but @pmeier might know more about this model and how it was trained.
Based on this PR Vasilis added the model, but I cannot find his user name here in case he has an account.

1 Like

I don’t know if this is useful or not but the original paper states:

“We train our models using synchronous training setup on 4x4 TPU Pod [24] using standard tensorflow RMSPropOptimizer with 0.9 momentum. We use the initial learning rate of 0.1, with batch size 4096 (128 images per chip), and learning rate decay rate of 0.01 every 3 epochs. We use dropout of 0.8, and l2 weight decay 1e-5 and the same image preprocessing as Inception [42]. Finally we use exponential moving average with decay 0.9999. All our convolutional layers use batch-normalization layers with average decay of 0.99.”

I tryed with the following settings which also ends up not learning anything after 1 epoch.

CUDA_VISIBLE_DEVICES=0 python -m pdb --model resnet18 --epochs 600 --opt rmsprop --batch-size 128 --lr 0.1 --wd 0.00001 --lr-step-size 3 --lr-gamma 0.99 --auto-augment imagenet --random-erase 0.2 --data-path ~/PerferatedBackpropagation/data/imagenet/

@pytorcher How many GPUs are you running with? From


I guess 1? If so, your learning rate is way to high. We trained the model with 8 GPUs, so you should roughly divide the learning rate by 8. Or you could multiply the batch size by 8, but I guess your setup (and ours as well) cannot handle that.

This should get you at least into the right ball park. However, all the other hyperparameters that we used are tuned to our setup. So you probably need to touch all the others as well if you want to achieve the same performance.

1 Like

Yes! This was it! Thank you so much. I forgot batches are summed and not averaged and that the “torchrun” I removed reduces the total size of batches by a factor of 8 and will make a huge difference. Dividing the learning rate by 8 has put me on the right track. 30 epochs down and now at Test: Acc@1 29.944 Acc@5 55.040. Optimistic that’ll get me to the right place, or if not at least learning is happening and I can play with the parameters to try to improve.

Final command for anyone who looks in the future:

CUDA_VISIBLE_DEVICES=0 python --model mobilenet_v3_small --data-path ~/data/imagenet/ --epochs 600 --opt rmsprop --batch-size 128 --lr 0.008 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2