I am training a simple resnet model (made only of 3 consecutive basic blocks whose final outputsize is 256, it goes as 3 -> 64 -> 128 -> 256) to classify images from CIFAR-10. I use SGD optimizer with initial learning rate 0.1, momentum 0.85 and weight decay 0.001.
I realized that the model first hits a plateau around epoch 20 and then upon decreasing the learning rate the test accuracy goes up to around %85 percent and training accuracy goes up to >%99.999 percent after which there is no real improvement (probably because the gradient is 0). So I added a check point that increases weight decay to 0.01 and then to 0.1 when there is over training. I do it as follows:
if(state['weight decay']==0.001 and len1>5 and all(item>0.99 for item in state['training accuracy'][len1-5:len1])):
state['weight decay']=0.01
print('Changing weight decay. New weight decay is')
for param_group in optimizer.param_groups:
param_group['weight decay']=state['weight decay']
print('%.5f'% param_group['weight decay'])
But after this change, the training accuracy remains at the same level. I am changing the weight incorrectly or the weights have become so large that this much increase in weight decay has no affect? I will try dropout now but I would also like to understand what I am doing wrong.